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Computer programs that simulate human long-term memory are reviewed. A 
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model of memory is described that may be constructed from the various pro- 
gramming efforts. Memory may be conceptualized as a large network with 
labeled links, where the nodes refer to ideas, and the links to the relations 
between those ideas. A model of this kind appears capable of giving meaning- 
ful answers to factual questions. The model may be so specified as to manifest 
spontaneous activity and local activity variations based upon excitation of 
associated contents; this explains context effects and other features, Retrieval, 
in the model, is effected by means of organized retrieval cues and complex 
information-retrieval strategies, which later permit the model to respond to 


cues differing from the original stimuli. 


Computer simulation has as one of its aims 
the verification of psychological theories. In 
actual fact, it has manifested itself primarily 
as a tool for the development of such theories. 
The construction of automatons that are 
capable of achieving a given kind of per- 
formance implies the formation of a theory of 
that performance. Such a theory is plausible 
as a psychological theory to the extent that 
constraints, derived from psychological data 
or from general biological considerations, are 
incorporated. This is the main function that 
psychological information has, at present, in 
automaton construction; the data are far too 
limited to permit straightforward translation 
into working models. Precisely for that reason 
is simulation a source of psychological hy- 
potheses: Constructing a model requires the 


1 An earlier version of this paper was prepared 
for the Symposion on Memory of the Association 
de Psychologie Scientifique de Langue Frangaise, 
Geneva, 1968. The research was supported by a grant 
from the Netherlands Organization for the Advance- 
ment of Pure Research (Z.W.O.). The author ac- 
knowledges his extensive indebtedness to his col- 
leagues of the Research Project on Thought and 
Memory, in particular Lambert Meertens, and to 
George Baylor, Université de Monréal. : 

? Requests for reprints should be sent to Nico H. 
Frijda, Psychologisch Laboratorium der Universiteit 
van Amsterdam, Amsterdam C Netherlands. 


invention of mechanisms whose value as pos- 
sible psychological mechanisms can then be 
examined. 

The present study intends to review the 
work that has been performed toward such 
construction. It does so by trying to outline a 
general model of human memory, with its 
variants and problems, as it may be synthe- 
sized from computer programs that have 
memory simulation as their explicit aim or 
that are relevant to that aim. The main thesis 
of this study may well be said to be that such 
synthesis is in fact possible and fruitful and 
that contrary to Hunt’s (1968) opinion, the 
programs, taken together, do yield a model 
of human cognitive activity, 

The studies considered relevant do not gen- 
erally appear under the heading of “memory 
simulation.” The programs are sometimes 
referred to as “semantic machines.” In fact, 
reports on several of the major programs are 
collected by Minsky (1968) under the title 
Semantic Information Processing: the pro- 
grams concerned are those of Black, Bobrow, 
Evans, Raphael, and Quillian (which all ap- 
peared originally between 1964 and 1966), 
The programs are also called “question- 
answering programs,” “fact-retrieval pro- 
grams,” or “natural language programs” and 
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have been reviewed under such headings 
(Hunt, 1968; Simmons, 1965, 1970): several 
programs, indeed, have the processing of 
natural language inputs as one of their pri- 
mary aims (Green, Wolf, Chomsky, & Laugh- 
ery, 1961; Lindsay, 1963; Quillian, 1969: 
Thompson, 1966). Other programs, again, are 
concerned with complex information process- 
ing in domains that happen to need a com- 
plex, lifelike data base, such as opinion change 
(Abelson, 1963; Abelson & Carroll, 1965) or 
neurotic belief-distortion (Colby, 1965). 

“Memory” in the present study is under- 
stood as human memory; and human memory 
as “human information storage and retrieval.” 
The study leaves aside the topics of physio- 
logical mechanisms and their simulation, as 
these may be found in neural net studies: 
nor does it concern itself with other basic 
learning mechanisms such as conditioning or 
pattern perception per se. 

Human memory is defined here as a system 
with the following properties; these proper- 
ties are required of a system to qualify as a 
human or humanlike memory: 


1. Content-addressable storage. Human (or 
animal) information retrieval usually occurs 
in response to information that is similar to, 
or associated with, the retrieved information. 
No knowledge of the topographic location of 
this retrieved information in the memory store 
is involved, 


2. A capacity to retain factual information. 
Information is s 


or facts. 


3. Inference potential. Information can be 
retrieved that has never been stored as such 
but that is implicit in information which e 
been stored. Not only is inferential capabil- 
ity a dominant characteristic of human infor- 
mation utilization (Bartlett, 1932: Selz 
1922): simple fact retrieval, even in a docu: 


mentation System, just requires such a capa- 
bility (Cooper, 1964), 


4. Flexibility of retrieval potential. Infor- 
mation can be retrieved even when the sys- 
tem is presented with retrieval cues differing 
drastically from the stimuli that gave rise to 
the stored information. Verbal descriptions 
may evoke memories stemming from visual 
impressions. Cues in English may trigge 
recollections in Dutch. 

These four properties, or capabilities, E 
termine, to a large extent, the nature of a 
models that have been developed and e 
are discussed in this study. They E 
prompted the structures to be described. x 
first three have shaped the conception of E 
memory store, and the last one, the access 4! 
retrieval procedures. de 

The four capabilities do not uniquely ay 
termine the model to be developed; they e 
be realized by quite different models. I 
constraints have to be introduced to obte 
psychologically plausible model. These c 3 
straints will have to be derived from psy aa 
logical data or from general theoretical af 
siderations. Several of these are mentionet 4 
the course of this review. One merits ment 
at this point. The memory store should p^. - 


" 

i 4 icity’ 

sess a high degree of structural simp of 
The number of its principles of construc! e. 


= 


or composition should be as small as poss! it 
There is little psychological evidence fl 
to be otherwise; the psychology of ear". 
traditionally tries to explain memory in wt. 
of one, or very few, basic concepts. In ac 
tion, although it may often be tempting 
introduce a variety of structural pinan js 
structural simplicity of the memory oie | 
a prerequisite for generality and simplici 1 
the processes operating upon that store yer 
Ernst & Newell, 1967, 1969). For - ( 
son, this constraint has been one of the $ 
ing forces in most of the relevant work. ei 
This review discusses, one after the a w 
the various components that one appe? on | 
need when constructing a memory $) on 
with the required capabilities. These | 
ponents are the following: | 


il 
| 
1. An information store, with spe ao 
H n x jon 
Principles of information representatio! 
of properties of organization: | 
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2. Recognition mechanisms, which assure 
the communication between input and the 
memory store; 

3. Input transformation mechanisms, which 
convert incoming information into internal 
representations; 

4. Acquisition mechanisms, which imply 
storage rules; 

5. Basic retrieval mechanisms; 

6. Utilization procedures; 

7. Output construction mechanisms. 


After discussing these components, some com- 
ments are made on the topic of forgetting. 
The study concludes with a summary over- 
view of the model and a brief evaluation of 
its empirical status. 


INFORMATION STORE: AN ASSOCIATION 
NETWORK 


The main component of a memory system 
is, obviously, the information store. The capa- 
bilities that are required of a humanlike mem- 
ory already imply one of its basic properties: 
associative organization. Content-addressable 
storage can only be realized in such a manner. 

The term “association” is used in this re- 
view in the purely descriptive sense. An item 
is said to be associated to another item if 
actualization or presentation of the first gives 
access to the other; that other item then may 
or may not actually be evoked. Association, 
as the concept is used here, has no implica- 
tion whatsoever concerning the genesis of this 
linkage. In this sense, then, associative struc- 
ture is the basic feature of memory. Contents 
are accessible by way of other contents to 
which they are linked, or by way of contents 
or stimuli that resemble them (the classical 
association by similarity). All programs that 
intend to simulate human memory, indeed, 
have their stores organized in this fashion. 

The associative organization of memory 
stores, in simulation programs, has not been 
prompted primarily by a desire to copy hu- 
man memory. An associative structure ap- 
pears indispensable when it comes to han- 
dling large collections of information of 
changing content, unorderly composition, and 
multiple purpose. Associative memories have 
been developed right from the start of work 
on artificial intelligence, out of that work’s 


own needs. The development of complex in- 
formation processing has hinged upon the 
construction of so-called list-processing lan- 
guages like IPL (Newell & Simon, 1963; cf. 
also Green, 1963) or LISP (McCarthy, Abra- 
hams, Edwards, Hart, & Levin, 1962). Spe- 
cial associative systems have been evolved 
for information representation problems, such 
as AMPPL (Findler, 1968) or LEAP (Rov- 
ner & Feldman, 1969), and hardware has 
been developed for the same purpose (e.g., 
Giuliano, 1967; for a review of this topic, see 
Findler, 1968). 

An information store organized in an asso- 
ciative fashion consists of a network oi ele- 
ments. From each of the nodes, links lead to 
any number of other nodes—the associations 
of the first node. Technically such networks 
usually consist of list structures. A node is 
represented by a list and its associations by 
the element of the list. A special element on 
the list may refer to a special list, a “descrip- 
tion list," containing the properties of the 
node itself. Each list element may itself be a 
list, and so on. Obviously, quite involved in- 
formation networks can be constructed, par- 
ticularly since list structures may link back 
into themselves. The orderly arrangement of 
the associated elements on a list need not be 
given any implication for the memory model; 
processing may disregard it. This is perfectly 
admissible, since list structures are only one 
kind of realization of the network; a network 
can also be implemented by more direct 
means, such as a loose collection of inter- 
locking element pairs or triples (as in the 
LEAP system). On the other hand, the order 
on a list, or some other index, can be made to 
represent "associative strength" of the link 
concerned, as in the programs by Reitman, 
Grove, and Shoup (1964) and Hintzman 
(1968). 


INFORMATION STORE: A RELATIONAL 
ETWORK 


Associative structure as such, however, is 
not sufficient to obtain the desired capabili- 
ties in a memory system. In addition, at 
least three other characteristics appear 
needed. The memory store should also be 
(a) a relational structure: the elements 
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given/for 
` Symbol \of 
smell grown in 


has-as-pert 


grown/in 


Fic. 1. Fragment of information network, 


should be linked by means of specified rela- 
tions; (b) a structure with hierarchical fea- 
tures: sets of elements should be capable of 
functioning as a whole as new el 
thus of being linked as a whole to other ele- 
ments; and (c) a structure of implicit infor- 
mation: the information should be stored in 
such a way that information implicit in it 
Càn be represented and can be used. 


ements, and 


Relational Structure 


From the stand 
1913, 1964; cf, al 


System in which, say, “dog” is associated to 
IS a member of the class of,” and this latter 
notion to “animal” (as well as to "dog" or 
) does not have 


y question- 
computer programs employ, for 
» &@ mode of data representation 


T to the notions of “schema” 


talt 
(Bartlett, 1932; Piaget, 1937), ot» (Sud 
(Kühler, 1929), or of “relational fact 
verhalt, Selz, 1913). nmi 
In all those programs the information a 
consists of a network of labeled te a 
tional network (see Figure 1). The a E 
information element can be desi’ a 
concept with a predicate, or an entity omp 
property, but most generally as a c | 


tot 


a relation: “rose, color, red" Mi 
Powers, oppose, anticolonial dorm “pié 
ample of Abelson & Carroll, 19 * fo 
functional element may be called an in "E 
tional molecule and may be ages i 
(A,R,B). It has probably been first c och 

lated explicitly and used as such by e pe 
(1963). It is explicitly the functional e els 

in the fact-retrieval programs of A 9 5) 

(Abelson, 1963: Abelson & Carroll, l al 

Black (1968), Cooper (1964), Frijda im. 
Meertens (1969), Green et al. OBL) 69") 
Say (1963), Green and Raphael E: 0 

Raphael (1968), and Simmons mU. A 

Burger, & Long, 1966; Simmons, BE E 
Schwarcz, 1968). In those programs. ! e? 
resents relational facts or rules. The pe af 
Tepresentation is also applied as the sta pu 
representation format for the various ki! 


jm 
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data, needed in a general problem-solving 
program: objects, operators, constraints, goal 
Structures, and criteria (Ernst & Newell, 
1969). Similarly, it can be used to represent 
commands, questions, or actions referring to 
overt activity of the system, such as displac- 
ing objects (Becker, 1969). The A,R,B com- 
plexes can be considered as cognitive schemas, 
and the relational network as a huge collec- 
tion of such schemas, overlapping in their As 
and Bs. In the resulting network, there exist 
as many kinds of link as the number of rela- 
tions that the system distinguishes, rather 
than the one kind of link of the classical 
model. The classical “association by con- 
tiguity" (except for being, maybe, a mecha- 
nism by means of which A,R,Bs are gen- 
erated) is just one of these kinds of link: the 
relation of simultaneity or of *followed by." 

The network can be described as a directed 
graph (Kochen, 1963; Simmons, 1966; Tes- 
ler, Enea, & Colby, 1968): the relations im- 
pose an order upon the linked elements. Each 
Node A may be linked to several elements 
Bı, Bs, Bs, etc., by the relations Ri, Re, Rs, 
etc. Element B is linked to Element Ci, Co, 
Cs, etc., by relations Rs, R’s, R's, etc. A and 
B may be linked by more than one direct rela- 
tion: “a bird, has as parts, wings”; and “a 
bird, propels itself by means of, wings” (see 
Figure 1). 

Programs with this kind of data structure 
are in fact capable of answering factual ques- 
tions or affirming the truth of correct state- 
ments (Black, 1968; Craig, Berezner, Car- 
ney, & Longyear, 1966; Green & Raphael, 
1968; Lindsay, 1963; Raphael, 1968; Slagle, 
1965), solving verbal problems such as verbal 
analogies (Frijda & Meertens, 1969; Reit- 
man et al., 1964), and performing other sorts 
of information manipulation, such as main- 
taining a dialogue with a human subject 
(Colby, 1967; Colby & Enea, 1967) or as- 
serting the credibility of assertions (Abelson 
& Carroll, 1965). Its utility for manipulating 
complex data structures has led the A,R,B 
triple to become the basic structure in the 
LEAP programming language, which has been 
applied to graphic design problems (Rovner 
& Feldman, 1969), 


on 


Structure with Hierarchical Features 


The associations or, rather, relata of each 
node constitute the meaning of the concept or 
idea concerned: A concept is, or can be con- 
sidered as, a bundle of properties and other 
relata. Each element of this meaning is itself 
defined by its environment in the network, 
and so on; the network clearly embodies the 
hierarchical structure of knowledge. Quillian 
(1968) defined the immediate surroundings 
of a node as constituting its “immediate defi- 
nition,” and the entire field, accessible from 
a given node, as its “full concept.” In fact, if 
processing procedures are set to work in this 
network, they may closely mirror the human 
process of knowledge evocation: enumerating 
first the closest constituents of a concept, 
then elaborating those, etc. The implications 
of this structure for the concept of meaning 
are, incidentally, interesting. The meaning of 
a concept can be defined as that selection 
from a concept’s “full concept” that is fo- 
cused at by the task at hand; the meaning of 
a concept is, in a sense, as variable as those 
tasks. 

The network is hierarchical in still an- 
other respect. Each cognitive schema of the 
form A,R,B can itself function as a unity, 
and enter as an element in another A,R,B 
complex: A,R(A’,R’,B’): “Russia, governs 
(Cuba, overthrows, Latin America)"; in 
other words, Russia governs the overthrow of 
Latin America by Cuba (example from Abel- 
son & Carroll, 1965). The composite element 
is, thus, a subnetwork of the memory net- 
work, of unrestricted complexity. Obviously, 
quite elaborate informational structures can 
be represented and built up in this manner. 
Although the mentioned hierarchical order- 
ings may be traced in the network, this net- 
work as such is not a hierarchical structure. 
It is “a general graph rather than a tree 
[Quillian, 1968, p. 29]," because network 
nodes may at any point link back into nodes 
that are in some way “above” them. 


Structure of Implicit Information 


The memory network as described has lit- 
tle or no structural organization. The organi- 
zation that does exist is implicit in the pat- 
tern of linkages between nodes, which may be 
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direct or indirect over other nodes, and which 
may give rise to important local differences in 
network density. 

More important, however, is the nonstruc- 
tural, implicit organization that is determined 
by the nature of the relations constituting the 
network links. The relations, too, have a 
meaning that is defined in the network, as 
their environment of connected nodes. Re- 
lations are, in part, just facts, like any other 
node. In part, however, their meaning con- 
sists of rules of inference, which can be read 
as other facts, but also as operations that the 
system may execute upon its data. Also part 
of the meaning of the relations is the probable 
or possible nature of their arguments, in the 
sense of the semanticist’s selectional restric- 
tions, or in more flexible ways (Quillian, 
1968). 

Among the rules of inference are those that 
constitute the meaning of the logical relations 
and express their Properties such as transi- 
tivities, symmetries, and reflexivities, Obvi- 
ously, the system's field of understanding is 
thus considerably enlarged, For instance, the 
program by Frijda and Meertens (1969) con- 
tains the relation of Synonymy. The program 
knows that an element in an A,R,B molecule 
may be replaced by every element which js 
its Synonym; links then may become accessi- 
ble between all associations of the first. ele- 
S. Colby's (1965) 
of certain neu- 
relations of in- 
It knows that 


[ t, in Colby's simu- 
lations, that can be resolved by a Process that 
reduces the opposition: the system ends by 
elieving that it just dislikes its lather, Quite 


Senerally, the relation “inverse” th 
at m 
hold bety dy 


ael's (1968) program is asked, "How 
fingers has John?" it answers “10,” given t : 
information that John is a boy, that M 
boy is a person, that every person has tw 
hands, and every hand five fingers; the d 
ing obviously involves more than this bc 
tivity of class inclusion: it also involves fie 
rule that properties of classes are pra 
of their elements, which is another meani 
component of the set-superset relation; Er 
of course, some property of numerical re 
tions. r € 

Spatial, temporal, and causal relations » 
treated in the same fashion. Raphael's P t 
gram, again, answers correctly to, “Is the ^i 
tray to the right of the cigar?" given be. 
pencil is to the left of the cigar, and the obi 
tray to the right of the pencil. Evans te hi 
1968) employed similar principles to : a 
program solve figural analogy prob E. 
Raphael's program. also handles parma 0 
relations; those of Lindsay (1963) an ela- 
Findler and McKinzie (1969b), kinship T i 
tions with their interconnections. Other jn 
ference rules may express just plausible Km 
Plications, for instance, psychological v 
tionships. Abelson and Reich (1969) HUE 
troduced the notion of "implicational p" 
cules” to indicate rules such as “(A, 35 
Xs (X, Causes, Y), implies, (A, wants, 4 y 
4a component of the meaning of purp 
“wants,” “causes,” and/or “does.” As "m 
be evident, the system may embody deduc", 
as well as inductive inference. In fact, for ase 
System there is no difference in the data "ne 
for both. Even logical inference is, oa 
System, as logical as the system um: 
it. If the inner representation of a paire d^ 
meaning is changed, the system's under ye 
ing and action are changed. It will act ela” 
logical or intelligent if that meaning 15 
orated. T 

The indicated meaning of the rele. me 
forms part of the total network in the WO 
Way as that of the elements related. In ” T 
Programs this is true in actual fact; NI 
Meanings have been embedded in the 7&3; 
list structure (Ash & Sibley; ? Black, ! aa! 

jo | 

LU. Ah ECHS Smby TRAME: A e, SE 

memory with 


mmy an associative base, (Tech, Rep- 
Diversity of Michigan, 1968, 
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Elliot; * Frijda & Meertens, 1969; Slagle, 
1965; Tesler et al., 1968). A relation, and 
meaning components of a relation, can be 
read in and retained like any other kind of 
data; consequently, a relation may be in- 
terpreted just like any other element 
(Frijda & Meertens, 1969; Shapiro & Wood- 
mansen, 1969). There is, in those programs, 
no structural difference between relations 
and other elements, and it would be mean- 
ingless to refer to two kinds of elements. 
Only because of some of their properties, and 
of their function in other compound elements, 
can they be identified as relations. 


INFORMATION STORE: GENERAL COMMENTS 
Alternative Realizations of the Model 


The network model as described leaves 
open a number of important aspects. There 
are alternative ways to fill these out, and 
different theories or programs have chosen 
different ways. 

For instance, the links in the network may 
be made one-way or two-way. In most pro- 
grams they are one-way, although in Raph- 
ael's system each new piece of information is 
coded in two symmetrical ways (“A > B" 
yields A > B and B < A); Findler's (1968) 
associative language explicitly allows two-way 
links. Links may be made to vary in strength, 
for instance, as a function of frequency or 
recency of utilization (Hintzman, 1968; Reit- 
man, 1965). The system may be given an 
economical distaste of redundancy, pushing 
each property as high up in the class hier- 
archy as possible and erasing that property 
at lower levels (Quillian, 1968; Raphael, 
1968). Relations, as indicated above, may be 
a few basic ones or as many as in the natural 
language. In most of these aspects informa- 
tion on which a preference can be based is 
lacking, and experiments are needed—experi- 
ments with programs in the first place. One 
of the forms the model may take is so essen- 
tial, however, that it is discussed in some de- 
tail; it is relevant to the general conception 


of memory, 
* R. Elliott, A model for a fact retrieval system. 


Unpublished doctoral thesis TNN-42, Computation 
Center, University of Texas, Austin, Texas, 1965. 


Memory as a Network of Active Semantic 
Elements 


This section title is taken from Reitman 
(1965). The links in the network—and thus 
each A,R,B molecule—may be provided with 
an activity index that influences the proba- 
bility of utilization of that link and molecule. 
Activity indexes may be influenced by a 
number of variables. They may be increased 
by current or recent activity of that link, as 
in Reitman’s program. They may also be in- 
fluenced by activity somewhere in a neighbor- 
ing part of the network; this may evoke the 
entire field of meaning of a concept, a sen- 
tence, or a paragraph, as in Quillian’s pro- 
grams (Quillian, 1968, 1969). An emotion or 
a desire may have similar activating effects. 
For instance, Colby’s program simulates cer- 
tain neurotic phenomena (Colby, 1965; Colby 
& Gilbert, 1964). It incessantly tries to ex- 
press the beliefs in its memory. It selects for 
expression a belief considered important, from 
among that cluster of beliefs with which it is 
at that moment preoccupied. If that belief 
conflicts with other beliefs (as computed 
from affectivity values of the components of 
each belief), efforts at deformation are made; 
a number of deformation mechanisms (“de- 
fense mechanisms”) are available, which vary 
in conflict reduction power. If after all defor- 
mation efforts the degree of conflict remains 
higher than the expression threshold, the sys- 
tem cuts its train of thought, selects a dif- 
ferent belief cluster, and pursues its rumina- 
tions. 

Another program, by Loehlin (1968), day- 
dreams. It selects from its memory those 
ideas that bear some relevance to the “drive” 
with the highest activity index at that mo- 
ment. When an idea is evoked, it decreases 
the intensity of its corresponding drive and, 
thus, indirectly, the activity index of other, 
related ideas. After some time a different drive 
will thereby become dominant, and the sys- 
tem will start to daydream about that other 
domain. Drives, in this model, may be consid- 
ered as network elements, the activity of 
which irradiates toward memories, ideas, or 
associated activities. 

Given a system with such activity indexes, 
a conception of memory suggests itself that 
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is different from the static conception usually 
held and implied so far. It is a conception 
brought forward in detail by Reitman (1965). 
Memory may be conceived as a continuously 
active system, which tries to express itself 
according to the hazards of its associational 
linkages and the reinforcement of its emo- 
tional orientations, In Such a System, external 
sümuli or tasks and intentions act 
influences. that restrict, 


1953) 
dream 
Sensory deprivation, and drug 
: Such a system 


À activity decay 
Which, so far, have to be rather arbitrary, 


nory may mean either of 
interna] activity, such as 
Where the only output 
Or success of an internal 


nonexternalized 
Such as the emergence of a ment 
the execution of internal 


con- 


t, all Programs use 
i the nodes —interna] 
Ch are references to 


3 » b 
(lodi stored routines. In the prre 
and Frijda and M 
grams, each Jda and Meertens (1969 pro 


attribute “name” 
might be done with 


IS has an A: 
Important co 

uenc i iud 

o. £e * conception of the interna] 

tation of Information. Considering 

5 


the way of functioning of the model, the a 
tire memory network is nothing but a p. 
system of access-ways, which ultimately A 
to procedures for the composition of wort à 
of mental images, or to activities and ON 
tions. The network does not contain d 
replica of any previous impression but 15; E. 
Some extent, a prescription to ecole 
them. In this sense, the model is an we 
mentation of Neisser’s (1967) reconstructive 
rather than reproductive memory. It A 
be admitted, though, that no program, so ia 
has put any effort in realizing the reconstr 
tion processes involved. 


Sufficiency of the Network Model 


As indicated earlier, the model, or ae 
models, manifests a minimum of suc 
order or organization. There is one huge; a 
fuse network with, possibly, local differ 
in density. There is, for instance, no "y 
according to a hierarchy of classes and oa 
classes or no distinction between “assot | 


of 


tions” and meaning core (an apple kw 
do” with Eve, and “is” a round, sweet fru! 
One may wonder to What extent such a mo 
can form the basis for intelligent uag oll 
stored information, Abelson and peri j 
(1965), and, in particular, Quillian ee . 
1968) have indeed given the class relatu 
ship special status, establishing a struct k5 
difference between subclass-superclass M. 
and other links, to provide for the dort , 
role of this relation in information utilizat, 
Tt can be argued, however, that all orde! 5, 
knowledge is the result of an interaction he 
tween the order information implicit in E 
network and the Specification of the infor” + 
tion-retrieval task at hand. Hierarchies pe 
Classes can be traced in the network if s57 
task defines an iterative search for the cl ed 
member relation, Definitions can be prod! {0 
if the quest for definitions is understo? td 
imply a Search for properties deemed eae 
tial, or criterial, as determined by furer- 


evaluation Processes, The order and © jp 
ence of knowled 


Storage of this 
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Vet, only experiment can prove these as- 
sertions right. So far, few programs have en- 
deavored to fully implement a memory store 
of the kind described. “And” and “or” links 
are present in nearly all of them. These, as 
well as the special class-relation links, may 
just be programming conveniences, however. 
More important is the introduction of two 
other structural differentiations by Quillian. 

Quillian distinguished “type nodes" and 
“token nodes.” A type node represents a 
complex concept; it gives access to a con- 
cept’s “immediate definition.” A token node 
represents the occurrence of another concept 
within such an immediate definition plane; 
each immediate definition plane—each bundle 
of associations—is made up of token nodes 
headed by a type node, and each token node 
points to Ais type node for further definition. 
In network terms this means that to the nodes 
in the network other networks are appended, 
the link to such an appended network being 
distinct from those between nodes within a 
network; Tesler et al. (1968) have given a 
formal description of these matters. 

The need for this type node-token node dis- 
tinction is connected with, among other 
things, the introduction of another type of 
distinct link, that of modification. If the con- 
cept of “plant” is defined as a “living struc- 
ture,” the concept of “structure” is, in this 
connection, modified by that of “living.” Such 
modification is local to the context of a given 
type node and, hence, needs a special struc- 
tural characterization. 

Tt is uncertain whether these differentia- 
tions are really indispensable or whether the 
knowledge structures concerned can be re- 
produced in a less structured network repre- 
sentation. In many cases they certainly can. 
Quillian (1968) in fact commented upon the 
possibility of replacing modification by la- 
beled links, as meant in this study. It re- 
mains to be examined, however, whether such 
replacement is always feasible and whether 
the resulting structure does not become too 
diffuse to permit efficient processing. Quite 
generally there exists a trade-off between the 
efforts spent in structuring the to-be-stored 
information and the effort needed for proc- 
essing during retrieval or problem solving. 


How the balance is constituted in the case of 
the human system is as yet obscure. 


Some Points of Internal Representation 


A few more words must be said concerning 
information representation at the level of the 
individual node or link. 

Reference has already been made to the 
fact that qualifications of the individual link 
may be necessary. Activity indexes are used 
by Quillian and Reitman; criteriality, inten- 
sity, credibility, or emotional value indexes, 
among others, are found in Quillian and in 
Colby. Quillian also used a special parameter 
to indicate intensity range of attribute values. 
The predicate calculus quantifiers too are 
essential (Coles, 1968; Kochen, 1963), and 
even more refined quantification seems needed 
(cf. Simmons, 1970, p. 20; Woods, 1968). On 
the other hand, all but the quantifiers and ne- 
gation (Black, 1968, has devoted some dis- 
cussion to that) may well turn out to be tech- 
nical conveniences, not really requiring struc- 
tural complications and taking the place of 
what should be ad hoc computations within 
the network. 

For the rest, it should be realized that the 
elements entering an A,R,B molecule may be 
of any imaginable kind. They may be num- 
bers, as the value of attributes like “number” 
or “size.” They may be variables (“formal 
nodes”; Tesler et al. 1968), as in rules and 
abstract schemata for which constants may be 
substituted during processing; variables may 
also be used as indications that there is 
“something” of undefined nature (“rose, 
color, x” meaning “a rose has a color”; cf. 
Simmons et al., 1966). Values of attributes 
may represent not a single value but the 
range over which that value may vary (Quil- 
lian, 1968), for instance, to render imprecise 
categories such as “large” or “reddish.” 

There is one kind of element which in a 
sense explodes the network system. Elements 
may refer to operations rather than further 
A,R,B complexes; these elements’ meaning is 
located outside the network in the program’s 
bank of routines. This applies to the quanti- 
fiers just mentioned, to output construction, 
and, in particular, to actions referring to 
covert or overt activity of the system (Becker, 
1969) and to the essential meaning aspects 
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of the more important relations. Many rela- 
tions refer to substitution or concatenation 
Operations, upon which the inference potential 
of the system rests. However, apart from the 
fact that operations may in general be given 
the same A,R,B-like structure (the notation 
R[A,B] is one of the standard LISP formats 
for instructions), the distinction between data 
and program is a nonessential technical mat- 
ter, which in some programming languages is 
hardly noticeable; the operations may be 
thought of as forming part of the network. 
So far, reference has been made mostly to 
information of a verbal or verbalized nature. 
Figural information can in Principle be repre- 
sented in the same Way as an A,R,B network 
(e.g., Coles, 1968; Evans, 1964, 1968). In 
fact, Evans considered Storage of figural in- 
formation in terms of properties rather than 
of mere coordinates 
gent manipulation of 
is not to deny that complicated problems of 
information representation may well arise. 
Rigorous psychological study of internal rep- 
resentation of experience has hardly begun 
(see Gregg, 1967; Hayes, 1965, 1966; Mi- 
chon, 1968, for some examples). On the other 
hand, we would venture the guess that the 


ementary qualities are 
and should be repre- 


RECOGNITION Mecnanisms: 
Communication BETWEEN 
INPUT AND Memory STORE 


ES As a memory store, communication 
us 


ing memory trace,” 
methods are Possible. One is exhaustive mem- 


y ; this is implausible in human 
mory and, moreover, uneconomical. The 


two 


other is “association by similarity" or, 1 
somewhat more contemporary terms, a prog 
ess of resonance (Duncker, 1945; Köhler 
1929). The system has to search for a con 
tent that matches the input in a more 0 
less direct way. Technical realizations of somi 
form of resonance process are possible (e£ 
Willis, 1960) and so is explicit programming 
of such a process, rk 
The entry zone of the memory netwo : 
may be represented as (and usually is, te! 
nically) a list of names and properties. E 
element of this list indicates a list that a. 
sents the network node—the “type node y 
Quillian called it—of that element, i 
which thus contains the R,B components E 
that element, In nearly all programs the SY 
tem is entered by the Element A in the AER 
compounds. Entrance by way of the Bs s 
also be provided, though, either by conan 
ently also storing the inverse relation 
(B,R,A) (Raphael, 1968) or by means 
direct pointers from the entry zone to the ). 
(Findler, 1968; Findler & McKinzie, 196% 
Contact with the memory store is made 
means of retrieval cues that consist of nam, 
or of properties. The entry list of names a 
Properties must be scanned systematically ^, 
order to find the names or properties © 
match the cue and thus to gain access to 
Corresponding node, +. an 
If the cue is a compound one—what T 
object which is red, fragant, a flower; b. 
thorns??— the requested node is the inter 7 
tion of the lists corresponding to the pro! ne 
ties. Reitman (1965) has implemented tind 
resonance process in such a manner, por bt 
out its similarity with Hebb’s notions Gum 
1949) on trace evocation, In Reitm i 
ARGUS program, each memory element se 
its charge index. Stimulation (EBs bee a! 
one of its properties occurs in the ae cn 
Cue) increases this charge index by a £ all | 
quantity. Comparison of the charges 2 pest 
elements yields the element with the we it 
charge, which is the requested intersecti? nu 
is the element that most resembles thé ef? 
Systems of this kind are present in P? af 
recognition Programs such as those of A de 
ridge (1959) or Uhr and Vossler (1965; ch ) 
r, 1963). The charge indexes may, i? of 
Programs, be a function of the occurrent 


i 
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properties in the stimulus as well as of the 
criteriality of those properties for the mem- 
ory element; criteriality usually is a func- 
lion of the experienced frequencies of associ- 
ation between properties and memory ele- 
ments. 

Because each memory element may be said 
to shout with a loudness proportional to its 
charge index, such programs are known as 
“pandemoniums” (Selfridge, 1959). Obvi- 
ously, recognition can occur even if presented 
information and memory element are not 
identical but only similar, and the degree of 
similarity may be measured relative to an 
acceptance threshold. The magnitude of the 
highest charge index may be made to reflect 
recognition certainty or stimulus acceptabil- 
ity. 

These access systems are realizations (or, 
rather, simulations) of the principle of paral- 
lel processing. Although the processing occurs 
in fact sequentially, the sequence is imma- 
terial to the results. The resonance concept, 
or Hebb's notions, clearly imply parallel 
processing. Nevertheless, a sequential proc- 
essing model may appear more plausible in 
some connections. Such a model has been 
developed and applied with interesting re- 
sults by Feigenbaum (1961, 1963, 1970; 
Simon & Feigenbaum, 1964); the model is 
known by the name of EPAM, for Elementary 
Perceiver And Memorizer. 

Feigenbaum argued that, for recognition, it 
is uneconomical to test the presence or ab- 
sence of all properties that stimuli may have. 
Also, it is uneconomical to store redundant 
properties of a given stimulus. The system 
should concern itself only with those prop- 
erties that are needed to identify the stimu- 
lus among the other stimuli with which the 
system is familiar. If a list of nonsense sylla- 
bles is learned in which all letters are differ- 
ent, detection of one letter of each syllable is 
sufficient for discrimination. 

The theory is as follows: Identification of 
an already familiar stimulus, and thus locali- 
Zation of its trace, is achieved by a sequence 
of binary discriminations, each of which tests 
whether a given property is present or not. 
The system is supposed to know an alphabet 
of elementary properties; in the case of 
learning of words or syllables this is the com- 


mon alphabet. The system selects a category 
of properties, guided in its choice by pref- 
erence rules; for instance, when trying to rec- 
ognize syllables the first letter is attended to 
first, next comes the last one, and then the 
others, from left to right. With the properties 
in the selected category or categories, the 
system constructs a "discrimination net": a 
tree of tests by means of which the stimuli 
are discriminated. The construction of this 
net is governed by those stimuli that happen 
to have been presented. Suppose the system 
has to learn the list of syllables *DAX-PIT." 
Since first letters are regarded first, and since 
the letter D would be sufficient to reach the 
correct traces, the test “first letter D or not?” 
is constructed (Figure 2). 

Suppose the syllable WUT is added to the 
list to be learned. After the negative branch 
of the test “first letter D or not?” the system 
adds the test “first letter W or not?” Adding 
the syllable DAT leads to the construction of 
a test “last letter T or not?” after the posi- 
tive branch of the first D-letter test. New 
tests are inserted automatically when the sys- 
tem detects a recognition error (by noticing 
a discrepancy between the selected trace and 
the stimulus, as kept in short-term storage) ; 
it then takes a second look at the stimulus, 
taking more properties into account. The dis- 
crimination net grows to the extent that newly 
presented information requires this. The 
traces of the stimuli are constructed during 
discrimination learning and deposited at the 
terminal nodes of the net. With each presenta- 
tion of a given stimulus, some fragment is 
added to its trace until this is complete, that 
is, identical to the stimulus (at least in the 
latter version of the model, EPAM III; Fei- 
genbaum & Simon, 1964). 

The notion of a discrimination net has 
great generality of application. Something 
like it will doubtlessly be needed to obtain a 
closer simulation in programs such as those 
of Raphael, Quillian, or Frijda and Meertens. 
Also, it is in general a very useful technique 
for the indexing of a complex and coherent 
body of information (cf. Ernst & Newell, 
1967, 1969; Newell, 1964). 

The EPAM recognition model manifests 
one implausible aspect. The sequence of 
tests is fixed and rigid; this is a consequence 
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Fic. 2. Example of an EPAM-discrimination net. 


of the essential postulate of sequential proc- 
essing. Once the discrimination net has been 
constructed, it is difficult to enter it by way of 
Properties that are not located at the top of 
the net, and no change in set or attitude can 
change this. As a consequence, recognition on 
the basis of partial cues will show large dif- 
ferences in difficulty according to which prop- 


erties are lacking and which are not. In this 
respect, and for such re 


parallel processing 
adequate. Note tha 
the two models do 
quential or parall 


cognition tasks, a 
model would seem more 
t this difference between 
es not stem from the se- 


el processing as such, but 
from a consequence of these processing prin- 


ciples: in a pandemonium, each input is 
tested for the presence or absence (or inten- 
sity) of each possible attribute; in EPAM 
the stimulus, together with the attribute that 
happens to be the first to be tested, deter- 
mines which other attributes will be tested; 


this characteristic entails a large part of 
EPAM's explanatory power. On the mcs 
hand, the difference with respect to p ue 
or sequential processing as such does T 
some consequences. In EPAM, recogni " 
time necessarily is a function of the ip 
the set of properties. In a pandemonium, t s 
may be considered irrelevant to the model. 
Also, whereas error in the registration of : 
single property will have little consequent 
ina pandemonium, it will have grave conse 
quences in EPAM (cf. Neisser, 1967, p. 73)- 


Input TRANSFORMATION MECHANISMS 


A difficulty for the realization of a reso- 
Nance process as just described is the very 
large variability of equivalent eeigiones 
Visual objects represent different proximal 
stimuli, depending on angle of presentation, 
distance, etc.; objects of the same class may 
differ considerably in visual shape. It would 


xx 
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be possible to store each such stimulus en- 
countered and compare newly presented in- 
formation with the whole set, a procedure 
known as template matching. Verbal infor- 
mation may be treated similarly. In an early 
program by Simmons (1963) the sentences of 
à text all are stored literally, and further 
processing is performed only when the infor- 
mation contained in the sentence is needed. 
Such a scheme represents an unfavorable bal- 
ance between storage requirements and in- 
formation loss, or between processing effort 
during storage and that during recognition 
or problem-solving activity. It is more eco- 
nomical to transform the information into a 
more or less uniform format of internal rep- 
resentation. That is in fact what most pro- 
grams do. Verbal information is transformed 

“into a predicate calculus mode (e.g., Coles, 
1968; Kochen, 1969) or into the format of 
network elements or its equivalents; figural 
information is translated into property lists 
(Evans, 1964, 1968; Selfridge, 1959; Uhr & 
Vossler, 1963), which again may be in predi- 
cate calculus form (Coles, 1968). 

Some of the question-answering programs 
require an input precoded into a standard 
format (Abelson & Carroll, 1965; Colby, 
1965; Frijda & Meertens, 1969; Slagle, 
1965). Abelson (1966), for instance, presents 
his program with the statements of a certain 
politician, but in standardized paraphrases 
consisting of (possibly embedded) subject- 
predicate pairs. In other programs this work 
of coding is executed by the program, either 
by means of sentence-format matching (e.g., 
Bobrow, 1964, 1968; Colby & Enea, 1967; 
Craig et al., 1966) or by more or less elabo- 
rate syntactical analysis (Coles, 1968; Green 
et al, 1961; Kellog, 1968; Lindsay, 1963; 
Simmons et al., 1966; Woods, 1968). In the 
program by Simmons et al., lor instance, the 
input sentence "Jack and Jill went up the 
hill and fetched a pail of water" is auto- 
matically analyzed, first into a set of conent 
uents, and then into the following “kernel 


strings”: 
1. (Jack went up the hill) and String 3; 


2. (Jack fetched a pail) and String 4; 
3. (Jill went up the hill); 


4. (Jill fetched a pail); 
5. (Pail (is of) water); 


these kernel strings are further transformed 
into A,R,B elements. 

However, syntactic analysis cannot be the 
only verbal input transformation mechanism, 
in part because of the resulting ambiguities 
and the large number of alternative parsings. 
Matching with internal representation has to 
be achieved in continuous interaction with the 
memory store itself. Real progress in com- 
puter processing of verbal inputs has, at any 
rate, been made, thanks to the realization of 
such interaction. Tentative parsings may be 
semantically evaluated, on the basis of selec- 
tional restrictions, or, generally, the data 
base, before analysis continues (Coles, 1968; 
Kellog, 1968; Schank & Tesler, 1969; Sim- 
mons et al, 1968; Woods, 1968). 

Quillian (1969) has recently enlarged his 
system in an extremely interesting way, which 
performs input coding with a minimum of 
syntactical rules. The meanings of the words 
of the input string are retrieved, and their 
points of contact (their possible common as- 
sociates) noted; the possible relation between 
the words is constructed from the information 
in memory, and, finally, this is syntactically 
tested for correspondence with their function 
in the input string. New A,R,B complexes 
may have been formed during the process 
and are added to the system. Schank, Tesler 
and Weber,’ in a different and more complex 
approach, based sentence parsing nearly en- 
tirely upon extensive consultation of the in- 
formation, associated with the words used or 
with tentative A,R,Bs formed during process- 
ing of the sentence. Among the information 
consulted are unexpressed objects and actors, 
whose existence is inferred by means of the 
information store. In both programs—that of 
Quillian and that of Schank et al.—lexical 
ambiguity is resolved by the same means, 
which resolution also belongs to the input 
transformation activities. 


5R. C. Schank, L. Tesler, & S. Weber. Spinoza 
II: Conceptual case based natural language analv- 
sis. (Stanford Artificial Intelligence. Project Memo 
AIM-109) Computer Science Department, Stanford 
University, 1970. 


14 NICO H. FRIJDA 


It is important to note that this interaction 
between input transformation and memory 
store is not restricted to verbal inputs, but is 
of quite general utility and application. Re- 
solving ambiguities of letter identification in 
the analysis of handwritten text or in that of 
figural structure in the case of, for instance, 
overlapping geometrical figures involves a 
similar recourse to the memory store of exist- 
ing words or known shapes. In Evans’ (1968) 
program, the figure 


j 


I 


is decomposed into 


and 


because both subfigures had previously been 
met in a related figure. The same kind of 


process is used in the segmentation of non- 
verbal continuous ev 


for the recognition 


1969; Mermelstein 
1964), 


ents, which is necessary 
of such events ( Becker, 
& Eden, 1964; Uhr, 


Acquisition M ECHANISMS A 


ND 
STORAGE RULES 


The addition of three other postulates turns 
this storage scheme into the coherent theory 
of acquisition which EPAM really is (Feigen- 
baum, 1963, 1970). These postulates are: 
(a) To the traces are linked retrieval cues, 
by means of which associations of the stimu- 
lus can be located through the same discrimi- 
nation net. Such a cue consists of a partial 
copy of the associated stimulus and con- 
tains, in principle, just that amount of detail 
which is necessary for the latter’s discrimina- 
tion. Association learning occurs as one-trial 
learning, provided that the system was ready 
for learning at the time of stimulus presenta- 
tion. (b) Each elementary cycle of discrimi- 
nation and construction—if necessary—of 
new tests takes a given amount of time; this 
is the time it takes to locate a familiar item 
in memory. (c) There is a short-term memory 
of fixed, very small capacity. (d) Distribu- 
tion of learning effort is in part governed by 
preference rules based, among other things, 
upon acknowledging anchor points among the 
stimuli (first and last items, for instance, or 
“striking” items). 

With these postulates, the theory is capable 
of explaining quite a number of phenomena 
of verbal learning; simulation in fact pro- 
duced these phenomena. Tt produced learning 
curves manifesting the serial position effect, 
which match the experimental data quite 
closely. The effect derives from the preference 
rules together with the limited short-term 
memory: since processing takes time, and 
beginning and end of a list are attended to 
first, middle items are discarded from short- 
term memory during processing of beginning 
and end (Feigenbaum & Simon, 1962). It 
manifests the familiar consequences of intra- 
list and interlist similarity, as described by 
Gibson (1940). Tt is easy to see why it does 
so: the greater the similarity of the stimuli, 
the more discrimination test nodes are re- 
quired and the more numerous the confusion 
at a given stage of learning. For the same 
reason the model, or the program, manifests 
generalization and retroactive inhibition: 
after the insertion of tests for the discrimina- 


tion of interpolated syllables, the retrieval 
ind of the original list have insufficient de- 
ail. For the same reason, again, the model 


i igs 
Shows response oscillation (Feigenbaum & 
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Simon, 1961). Due to the processes of trace 
construction and the unit processing time, 
meaningfulness effects and familiarity effects 
are present; the mechanisms explain their 
interdependence. 

The addition of one other postulated proc- 
ess produced an extremely interesting expla- 
nation, and simulation, of gradual learning in 
an all-or-none learning model. The process 
concerns an attention distribution strategy: 
the program can choose among two such 
strategies. In the first, it may try, with each 
stimulus presentation, to learn about that 
stimulus as much as it can; however, when- 
ever a new stimulus appears, this replaces 
the old one, or an old one, in the immediate 
memory, thereby interrupting the discrimina- 
tion and learning process of that replaced 
item. In the second strategy, the program 
tries to learn an item as completely as it 
can, neglecting meanwhile any new incoming 
information. In case the stimuli are not yet 
fully learned, the first strategy leads to grad- 
ual learning phenomena, whereas the second 
manifests all-or-none learning; learning time 
depends upon the speed of stimulus presenta- 
tion, just as in psychological experiments 
(Gregg & Simon, 1967a). 

The EPAM model, and the notion of the 
discrimination net, appear to have still more 
power because the process of trace formation 
can be performed at different superimposed 
levels; this in fact is needed for adequate 
simulation results. The program may first 
construct the integrated elementary objects— 
say, letters out of lines and curves—then 
words out of letters, etc. (Feigenbaum & 
Simon, 1964). The program may also be made 
to construct different, independent discrimi- 
nation nets, for instance, one for visual and 
one for auditory stimuli; the program is then 
capable of paired-associates learning between 
the two nets (Feigenbaum & Simon, 1963). 

EPAM's notions are further examined by 


Hintzman (1968). Hintzman has constructed 
a somewhat simplified version. of EPAM, 
called Stimulus and Association Learner 
(SAL), with which some of the implications 
and possibilities of the discrimination net are 
explored. SAL differs from EPAM in that it 
contains probabilistic processes, which are 
avoided in EPAM proper. These processes are 


introduced to reproduce the performance vari- 
ations found with human subjects and to 
allow the generation of group results. In 
SAL, a new test is not introduced each time 
a discrimination error is made, as in EPAM, 
but only with a Probability a. If no test is 
added, the retrieval cue for the response is 
replaced by that of the correct response with 
a Probability b. A number of simulations 
with this model have investigated the effects 
of intralist similarity and of number of re- 
sponse alternatives; this has produced in- 
teresting results with respect to the theories 
of Bower (1962) and of Suppes and Ginsberg 
(1963). Of more general significance is the 
introduction of provisions for overlearning, in 
a version called SAL II, and of multiple 
associations to the same stimulus in SAL III. 
Overlearning is obtained by the addition of 
redundant tests when a correct response is 
made (also done by Wynn,’ in WEPAM). 

In simulations with these latter models, 
Hintzman (1968) has obtained degrees of 
retroactive inhibition comparable to those in 
psychological experiments. He also has pro- 
duced proactive interference, which EPAM 
does not, differences between recall and recog- 
nition, and other phenomena of verbal learn- 
ing. It is true that all this has been obtained 
with the help of some tricks of unclear theo- 
retical meaning (precisely the probabilistic 
processes), but the studies show clearly the 
potentialities of the underlying ideas. 

It is interesting to examine which factors 
determine speed of information acquisition in 
EPAM. First, there is the size of immediate 
memory. If immediate memory is small, and 
presentation time is short relative to process- 
ing time, learning may be quite slow. 

Second, speed of acquisition depends upon 
item familiarity. Stable or useful associations 
can, in EPAM, be formed only if the respec- 
tive elements are sufficiently familiar—that is, 
have well-developed traces and discrimination 
net sequences. These are important points, 
since those variables determining speed of 
learning are quite distinct and deterministic, 
in contrast to the stochastic parameters of 


6W. H. Wynn. An information processing model 
of certain aspects of paired associates learning. Un- 
published doctoral thesis, University of Texas, Aus- 
tin, 1966. 
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most verbal learning theories (for a nice ex- 
position of the contrast between these two 
approaches, see Gregg & Simon, 1967b). 

In complex information storage programs, 
speed or ease of acquisition depends upon 
additional and highly relevant factors. At 
least, this is so as far as useful acquisition is 
concerned—acquisition in such a way that 
the stored information can be retrieved or 
recalled with relative ease. In all question- 
answering programs, new knowledge is re- 
tained whenever it is presented. However, 
major efforts during acquisition have to be 
directed at inserting this new information 
into the memory network. As mentioned in 
the section on input transformation, recoding 
the input sometimes has to be based upon an 
intersection between the new input and the 
stored information. In Raphael's program, 
for instance, one of the alternative meanings 
of a word (*has") is selected (and input in- 
formation is recoded accordingly) on the 
basis of plausibility: if the subject of the word 
has already occurred as the subject of one of 
its meanings, this latter will be its most 
likely meaning now. If “John has fingers" is 
presented, and it is known that “Peter, has 
as parts, fingers’—not “Peter, possesses, 
fingers"—the first interpretation will be pre- 
ferred. In Quillian's program, specifications 
are effected in a similar way, on the basis of 
associations or available “cue words” (Bs, in 
our A,R,B complexes). 

The examples given concern the resolution 
of ambiguities. The same interaction between 
input and stored information may be used to 
integrate new information into the network. 
By linking it up at an appropriate place, 
du e implications may become accessible 

; therefore, the Corresponding variety of 
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proactive interference. Another reason lies in 
the fact that most programs, by not dis- 
tinguishing between words and knowledge per 
se, avoid much of the interactions just de- 
scribed. A third and major reason is to be 
found in the fact that all fact-retrieval pro- 
grams have, in contrast to EPAM, indefinite 
patience, unwavering attention, and unlimited 
time. Time is, of course, an important con- 
straint. The human system is confronted with 
a continuous stream of information, presented 
at a higher rate than that of processing. 
Consequently, the system needs information- 
acquisition strategies, to control the sequence 
of processing, input selection, level or degree 
of recoding, condensing, etc. Not too much 
systematic work on acquisition strategies has 
been performed yet, either in psychology or 
in computer simulation. As mentioned above, 
EPAM III has provisions for different strate- 
gies, with important theoretical consequences 
(Feigenbaum, 1970; Gregg & Simon, 1967b). 
The different versions of the Concept Learn- 
ing System by Hunt (1967) differ precisely 
in this respect. In the work by Jongman 
(1967), some suggestions for the field of chess 
perception are given that appear to lend 


themselves to realization, as Simon and 
Barenfeld (1969) have recently demon- 
strated. 


Basic RETRIEVAL MECHANISMS 


Information retrieval may be made to oc- 
cur “spontaneously,” as in Colby’s and Loeh- 
lin’s programs. Usually, however, it is elicited 
by an external stimulus, a task or an inten- 
tion, which triggers the basic retrieval mech- 
anisms. 

The basic tools for information retrieval 
in those circumstances are the retrieval cue 
and a matching process. The retrieval cue, in 
nearly all programs, has all the characteristics 
that Selz (1913) ascribed to the anticipatory 
schema, and it has the same functions. It 
guides memory search, defining the kind of 
thing to be found, and it serves as the post 
hoc criterion against which success of search 
may be tested (cf. DeGroot, 1965; Selz, 
1964; Woodworth, 1938). The retrieval cue is 
a compound stimulus with the form of an in- 
complete information molecule: A,R,?, OT 


’ 
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A,?,R; ina limiting case only it may be 
ANE, 

In simple information retrieval, memory is 
searched for a content that maximally matches 
this incomplete schema: “Amsterdam, is a, 
?" may lead to “Amsterdam, is a, capital." 
The process is obviously equivalent to what 
Selz called “knowledge-unit completion” 
(Kom plex-ergünzung, Selz, 1913, 1964) and 
Which he considered the basic retrieval proc- 
ess. It is one of two fundamental memory uti- 
lization processes, the other being the truth 
request: “Amsterdam, is a, capital, true or 
false?” Based upon a fully filled-in retrieval 
cue, this latter process is identical to that of 
recognition. In both cases, the retrieval cue 
has a degree of organization that is absent in 
the stimulus constellations of more classical 

' theories: namely, the explicit mention of the 
functional relation between what is known 
and what is to be found. 

The processes by means of which the 
matching trace is located may be thought of 
aS being those described in the section on 
recognition. In actual fact, however, all 
question-answering programs use linear search 
of the lists corresponding to the retrieval cue's 
key element—the A in an A,R,B molecule. 

With the help of these basic mechanisms of 
matching and of retrieval-cue completion, in- 
formation retrieval may be a trivial matter. It 
is in fact a trivial matter on condition that 
the retrieval cue corresponds literally to some 
memory trace or to a major part of a trace. 
In such a case, direct stimulus-response evo- 
cation or “immediate production of knowl- 
edge" (Selz, 1913) is possible. 

This condition usually does not apply, 
however. We have posited as a fundamental 
property of human memory its capability of 
evoking some of its contents even when such 
literal correspondence does not exist. It should 
be possible, and in fact often is, to retrieve 
information that is not directly addressable 
by the retrieval cues. Without such a possi- 
bility, the system would just not be capable 
of functioning profitably in a natural environ- 
ment, even when provided with standard in- 
put transformation mechanisms. The system 
realizes this possibility by the instigation of 


active search procedures. 


Two different descriptions can be given of 
these search procedures; these descriptions 
correspond to two contrasting conceptions of 
cognitive functioning generally (cf. Reitman, 
1965, for an extensive discussion). They may 
be called a goal-dominated and a diffuse acti- 
vation conception, respectively. Information 
processing may be conceptualized either as a 
process that is, step by step, determined by 
the goal or task at hand or, alternatively, as 
a process that tends to develop autonomously, 
governed by local conditions, but controlled 
post hoc by goal-determined testing and edit- 
ing. 

In a goal-dominated model, the task, as the 
subject understands it, selects and calls the 
information-processing operations. The goal 
determines, from start to end, which steps are 
to be taken. This is the kind of model pro- 
posed by Selz or by Newell, Shaw, and Simon 
(1958) and which corresponds most to the 
usual principles of computer programming. It 
is found, among others, in the programs by 
Black (1968), Frijda and Meertens (1969), 
Green et al. (1963), Raphael (1968), and 
Simmons et al. (1968). 

Memory search, in a goal-dominated mode, 
is performed by searching for the requested 
kind of thing; if the kind of thing is found, 
the actual thing is evoked and, if the task 
requires it, further tested against the re- 
trieval cue. This usually means that the list 
of the retrieval cue's key element is searched 
for the requested relation or something equiv- 
alent to this relation. If this relation is found, 
the B element that belongs to it is retrieved 
or constructed. 

A diffuse activation model more resembles 
traditional associationist conceptions; it is 
found in the programs by Reitman (1965; 
Reitman et al, 1964) and Quillian (1967, 
1968) and has been implemented in hard- 
ware by Giuliano (1967). Memory search, in 
a diffuse activation mode, is performed by 
retrieving all elements having to do with the 
retrieval cue's key element (or elements), 
possibly selecting the more plausible candi- 
dates among them. If such elements are 
found, they are tested to check whether any 
of them is the kind of thing requested and, 
consequently, the requested thing itself, This 
usually means that every B element is located 
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that is related (directly, or indirectly if the 
system searches that far) to the key element; 
subsequently, a test is performed to check 
whether the relation of any of those Bs to 
the key element corresponds to the requested 
R, or can be considered equivalent to it. The 
method to locate elements “having to do" 
with the key element is to explore all possible 
associations—that is, by diffuse activation: 
every element linked to the key element is 
tagged, or a unit of activation is added to its 
activation index. Plausible candidates can be 
selected if some activation indexes are higher 
than others. 

Even in the case of simple truth requests 
(“A,R,B”—true or false?) the models may 
already give rise to different processes. In a 
diffuse activation model, activation starting in 
A may reach B, among others, and be de- 
tected by scanning B's own key list. If A and 
B are more than one node distant from each 
other, the stratagem also works. It is in fact 
applied for that purpose by Quillian (Collins 
& Quillian, 1971; Quillian, 1967, 1968). No 
such possibility exists in a goal-dominated 
model. There, the existence of a given B can 
only be detected by going down all branches 
extending from the A-node and systematically 
testing each B encountered (or those Bs re- 
maining after having identified the satisfac- 
tory relations). 

'The models generally diverge in this way 
when search is determined by more than one 
retrieval cue element. Another example is 
given by retrieval on the basis of a compound 
retrieval cue (“what is red, is a flower, has 
thorns?"). As described in the section on rec- 
ognition mechanisms, the elements forming 
the intersection of the search lists may, in a 
diffuse activation model, accumulate activa- 
tion from more than one source. In a goal- 
dominated model no such accumulation oc- 
curs, and the intersection elements can onlv 
be discovered by pairwise comparisons be- 
tween each of the elements of each list. Un- 
der such conditions, a diffuse activation model 
will generally be much more economical. 

A diffuse activation model has more useful 
properties, which increase its plausibility as a 
psychological theory. As indicated earlier 
activation of the elements in the search sek 
may be made to irradiate into each of those 
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elements! surroundings, extending to one or 
more nodes beyond the points of departure. 
Reference has been made already to the fact 
that Quillian (1969) applies this process to 
disambiguate words in a sentence: the activa- 
tion of the meaning network of only one of 
an ambiguous word's meanings is reinforced 
by other relevant words in the sentence. Also, 
the build-up of an activation pattern on the 
basis of, say, a coherent story may yield an 
internal representation of that story's mean- 
ing; this activation pattern functions as 
short-term storage for that information, much 
as in Hebb's (1949) original conceptions. 
Coupled to the conception of a network of 
active semantic elements, into which it fits 
well, diffuse activation. permits information 
processing to be influenced diffusely indeed, 
even in the absence of a specific goal, by 
aftereffects of earlier activity. Associative 
priming, set, and the like may thus be real- 
ized, and the course of free association can 
be made less implausibly capricious than 
otherwise (although such capriciousness may 
also be reduced by, more directed, “cognitive 
means [Colby & Enea, 1967]). Finally, dif- 
fuse activation is a way to enable problem 
solving, or a course of thought, to change its 
goals because of chance encounters in the data 
—an event not infrequently met in human 
thought; Reitman (1965) in particular emr 
phasizes this point. 

Obviously, depth of irradiation and, in par- 
ticular, rules for activation decay have to be 
specified. In Quillian's program, for instance, 
decay is complete after a task if finished ; 1H 
Reitman's program, decay is a function O 
time. p 

It should be remarked that goal domination 
and diffuse activation are not really alterna- 
tive conceptions; or, rather, that goal domina- 
tion within a diffuse activation model is quite 
possible. On the one hand, random or eX 
haustive generation of associations to a given 
stimulus may be considered a goal-directe 
strategy. This strategy applies when, for 1n- 
stance, the other cue components are not ad- 
dressable; that, in fact, constitutes the “gen- 
erate-and-test-strategy” described by Newell 
and Simon (1967). Diffuse generation of 
associates, one or more nodes deep, can like- 
wise be considered a method to be applied 
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under specified conditions; Abelson and 
Reich (1969) posited a "completion ten- 
dency" as this condition. In fact, all activa- 
tion phenomena can be mirrored in a goal- 
dominated model with the help of one or 
more working memories; activation and 
working memories are functionally equiva- 
lent. On the other hand, in a diffuse activa- 
tion model the retrieval cue may as a whole 
determine activation: the more involved the 
retrieval cue, the more focal the activation it 
elicits, and the more the process approaches 
that of goal-dominated search. The organiza- 
tion of retrieval cues may well be one of the 
major variables influencing retrieval efficiency 
in human subjects. 

Whatever the basic model, one of the core 
problems in modeling retrieval is the same 
for both: specifying priority rules for evok- 
ing the associations of a given node, or, in 
other words, probability of evocation of a 
given associate. Systematic search, as pro- 
vided by most programs, is surely not the 
most plausible procedure; stochastic drawing 
with replacement may have to be proposed 
instead. Preference based upon frequency or 
recency is the only guideline that psychology 
offers for weighting within such search pro- 
cedures, and this principle does not appear 
really helpful to explain the order of evoca- 
tion of meaning components. 


UTILIZATION PROCEDURES: INFORMATION- 
RETRIEVAL STRATEGIES 

As just stated, information retrieval is a 
relatively simple matter when there exists a 
trace that exactly matches the retrieval cue. 
If this is not the case, active search pro- 
cedures have to be launched. These can be 
described as information-retrieval strategies. 

Maybe the simplest deviation from immedi- 
ate reproduction of knowledge occurs when 
traces matching the retrieval cue do exist, but 
are not addressable by this cue, or where the 
cue is too unspecific. If, for instance, a few 
digits are briefly shown and the sue has 
to report which digits they are, one may as- 
sume these digits to be stored as digit x, 
presented in, experiment just now, whereas 
no trace that is addressable with the cue 
“digit?” exists, nor a trace corresponding to 
the list “experiment just now, digits pre- 


sented?” In such cases, transformation of the 
recall task into a recognition task applies, or, 
in other words, the “generate-and-test-strat- 
egy”: running off all digits will make the sys- 
tem stumble upon the trace mentioned. 

Information-retrieval |strategies—informa- 
tion-processing strategies, generally—may be 
described as methods for transforming the 
task into another, possibly more easily solv- 
able one (DeGroot, 1965; Duncker, 1945; 
Selz, 1922). In information-retrieval activity 
other than the previous case, this may be 
specified as transformation of the retrieval 
cue such that, and until, correspondence with 
some memory trace is found. 

The simplest method to transform the re- 
trieval cue is by interpretation. The retrieval 
cue, or one or more of its components, is re- 
placed by one of its meaning components, as 
this is present in the network; or the cue 
components may be transformed by opera- 
tions indicated by those meaning elements. A 
number of such transformations may take 
place successively. In the program “Baseball” 
(Green, 1963, Chapter 13; Green et al., 
1961), for instance, the question “How many 
teams played in eight different places in 
July?” is first, by syntactic analysis, trans- 
formed into the canonical form “(team, 
number of),?; (places, number of), 8; month, 
July.” Interpretation of “number of” leads 
to the cue “team, ?; (places, number of), 8; 
month, July,” and the instruction to count 
the number of traces corresponding to this 
cue. Since no trace contains the component 
“(places, number of), 8,” interpretation of 
“number of” plus a value leads to “team, 
each; (places, number of), ?; month, July,” 
where “each” is understood to apply to every 
value of the attribute “team.” Finally, re- 
peated interpretation of “number of” trans- 
lates into the cue “team, each; place ?; 
month, July," plus the instruction to count. 
This cue matches traces like “team, Red Sox; 
place, Boston; month, July." 

The interpretation process continues until 
a retrieval cue is obtained that matches 
some trace. This is the case for the last cue 
only. But once this cue has been completed, 
for each available value of “team,” the ac- 
companying counting process has formed a 
trace for the preceding retrieval cue, and so 
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up again. The process is an example of the 
general dynamism of problem-solving proc- 
esses, formulated in Miller, Galanter, and 
Pribram’s (1960) TOTE model, and imple- 
mented most explicitly in Newell, Shaw, and 
Simon’s (1958) General Problem Solver pro- 
gram (Ernst & Newell, 1969; Newell & Si- 
mon, 1963). The course of processing is gov- 
erned by feedback of the difference between 
desired state and actual state of the problem; 
the desired state, in the present case, is an 
expression without a question mark. 

The interpretation of retrieval cue com- 
ponents can occur automatically, whenever 
the cue cannot locate a matching trace. For 
instance, in the program by Frijda and Meer- 
tens (1969), a definition may be asked: what 
is an A; in coded form: “A, is, ?” If no ready- 
made definition exists—that is, no element 
“A, is, B" can be found—the system will in- 
vestigate the meaning of “A” and of “is.” 
For “is” it may find: “is, facet, (form, color, 
function, made of, . . .)," and it will suc- 
cessively replace “A, is, ?” with “A, form, 
?," “A, color, ?,” etc. The process may be 
described either as an “interpretation strat- 
egy” for solving retrieval problems or as 
irradiation of activation toward meaning ele- 
ments. At any rate, the memory network is 
used for cue transformation with the help of 
very little preprogrammed machinery. Obvi- 
ously, the described interpretation process 
may yield entirely useless associations that 
will have to be pruned away—a topic that is 
discussed below. 

Many information-retrieval tasks require 
more elaborate search strategies than simple 
retrieval. For instance, to solve verbal anal- 
ogy problems, the system may be told to 
search for the relation between the first two 
words, then for a response word that stands 
in that relation to the third word (this is a 
simplification of the strategy in Reitman’s 
program, where diffuse activation yields 
plausible candidates for appropriate ` rela- 
tions). Another example: requests for a defi- 
nition may instigate the search for a super- 
ordinate concept plus a distinctive specifica- 
tion. 

M ha Iuuen information 
parva soi er tasks, the most powerful 
use of the data store’s in- 
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ference potential; they make use of the mean- 
ing of the system's relations and the rules of 
inference that belong to that meaning. Prop- 
erty requests (A,R,?) may be solved by 
means of the strategy: find a superordinate 
class of A (A, is a, X); find a property of this 
class (X,Q,Y); if this is the right kind of 
property, Y is the solution, Raphael’s exam- 
ple (see section entitled “Structure of Im- 
plicit Information”) is of this kind. Inference 
of this sort is used in a great many programs: 
Black (1968), Cooper (1964), Frijda and 
Meertens (1969), Raphael (1968), Simmons 
et al. (1966), Slagle (1965), to mention some 
of the more important ones. 

Strategies like these are not only used in, 
and useful for, problem solving in the strict 
sense. They also function in neurotic belief 
distortion, as in Colby’s program. The mech- 
anism of displacement, for instance, trans- 
forms a belief into another belief, which is 
directed towards a different but similar ob- 
ject. Such an object is found by first deter- 
mining the class to which the original object 
belongs, and next finding another member of 
that class. Me hates father; father is a man 
older than 40 years; Allen is a man older 
than 40 years; me hates Allen (Colby, 1965; 
Colby & Gilbert, 1964). 

Abelson's (1966) program determines the 
credibility of assertions by similar means: à 
statement is credible if the memory store 
contains more supporting than contradictory 
beliefs; a supporting belief is one the sub- 
ject of which is an exemplar of the subject of 
the statement, and the predicate of which is 
an exemplar of the predicate of the statement. 
Colby, again, used similar methods to assess 
whether the “reason” for a belief is “war- 
ranted” (Colby & Enea, 1967). The proc 


esses in both programs are quite plausible as 
simulations of human thinking; human 


thought, in fact, seems to move easily and 
willingly along the lines of hierarchies of 
classes. 

The last examples are examples not of de- 
ductive but of inductive inference. Inductive 
inference is indeed in many respects within 
the capabilities of the system as described. In 
a sense, the data store as a whole is a body 0t 
knowledge that may be used for inductive 
inference. The information, associated to 2 


given node, represents a model of a situation 
or other state of affairs, which may be evoked 
to construct hypotheses or establish expecta- 
tions: since roses have fragrance, what about 
the smell of this one? In Quillian's program, 
and that of Schank et al. (see Footnote 5), 
the nature of subjects and objects of relations 
mentioned in an input sentence is diffusely 
evoked and tested against other words of the 
sentence, to resolve syntactic and lexical am- 
biguities and, generally, parse the sentence. 
In addition, explicit inductive inference 
rules can be and have been used, as men- 
tioned before. Abelson and Reich (1969) em- 
ployed their “implicational molecules" to gen- 
erate a field of meaning that may be used in 
evaluating credibility of a message or to ex- 
tract new information from it: a psychoana- 
lytically oriented system may derive “A, 
wants, Y? from the fact “A, fears, Y" by the 
rule “(A, fears, Y), implies, (A, wants, Y).” 
Expectations generated in this manner may 
serve to maintain some direction in a flow of 
association, or of conversation. Colby and 
Enea’s (1967) program maintains a meaning- 
ful dialogue with a human subject by assum- 
ing, for instance, that “why” questions will 
usually be answered with "because" state- 
ments; if this does not happen, the question 
is rephrased and repeated. In a study by 
Becker (1969), stored event sequences are 
explicitly used to make predictions, test these 
against coming events, detect inconsistencies, 
and modify the data store if the outcome 
suggests that it should. Quite generally, the 
system is capable of asking spontaneous ques- 
tions—why, whence, how—by comparing in- 
put with the field of associated information of 
its trace and detecting where input informa- 
tion is lacking. Induction may, at this point, 
even go further. Input may be compared to its 
matching trace, but it may also be compared 
to traces that are only similar: to schemata 
that are more abstract (contain variables 
where the input contains constants) or that 
are just different in one or more of their at- 
tributes. This, in fact, is a further mechanism 
of induction, which has been indicated by 
Kochen (1963) and by Becker; also, it is gen- 
erally the mechanism for the application of 
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. ^ The inference- processes-“as described are 
clearly’ '¥al-deminated“Strategies. A diffuse 
activation -description is an alternative ap- 
proach to the same problems, maybe equally 
plausible as a psychological model, and prob- 
ably somewhat more flexible. When asked for 
a definition, the system may diffusely evoke 
all associations of the concept and test post 
hoc whether the relations, or the entire chains 
of relations over various nodes, fit the notion 
of a definition—that is, of the notion of “is.” 
Obviously, in this testing phase the same in- 
ference rules as in the goal-dominated strate- 
gies described above, and a similar mechanism 
to employ them, will be needed. Only the gen- 
eral organization and context of the processes 
differ, and so may some of the economic as- 
pects and the conditions under which either 
approach may be most profitably applied. 

Information-processing strategies like those 
mentioned may be called by the task specifi- 
cation: problems labeled as verbal analogy 
problems signal the corresponding procedure; 
requests for a definition signal another, etc. 
Those problem-solving procedures are pre- 
pared in advance by the programmer and 
coupled to task cues by way of a “table of 
connections” (Newell & Simon, 1963b). How- 
ever, it appears that this is not always neces- 
sary. On the contrary, many or most of the 
problem-solving strategies can be easily com- 
piled from the information in the memory net- 
work itself. The material is fully available in 
the meaning components of the relations. The 
problem-solving methods may be made to 
emerge step by step during the course of 
problem solving. 

This happens in several programs: those of 
Black (1968), Colby, Tesler, and Enea 
(1969), Frijda and Meertens (1969), Green 
and Raphael (1968), and Slagle (1965). If a 
retrieval cue does not find a matching trace, 
it may find a matching general schema—an 
expression for the variables of which the com- 
ponents of the retrieval cue may be substi- 
tuted. If the matching schema is the left-hand 
side of a rule of inference, the retrieval cue 

or the known facts may be transformed into 

the corresponding filled-in right-hand sub- 
expression. Let the question be asked: “Soc- 
rates, nature, ?” and the facts be known 
ocrates, is a, man” and “man, nature, mor- 
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tal." Let the schema exist “[(A, is a, B), and 
(B,R,C)], implies, (A,R,C)." Retrieval cue 
matches part oí the schema; the match is 
meaningful, considering the known facts; a 
new fact may be constructed that fully 
matches the cue and yields the answer. 

The appropriate rule of inference may be 
found by blindly scanning through the data 
store. More adequately, however, it may be 
found by interpretation of the retrieval cue's 
components. *Socrates" yields, let us assume, 
“Socrates, possesses, wisdom,” “Socrates, age, 
old,” and “Socrates, is a, man.” Let us also 
assume that, as in the program by Frijda and 
Meertens, the inference schema is part of 
the meaning of “is a”: “is a, aspect, [ (A, is 
a, B), and, (B,R,C)], implies, (A,R,C).” Of 
Socrates’ meaning components, only ^is a, 
man" is useful since it yields a matching 
schema, and substitution in it is explored. 

Note that the process as sketched is guided 
solely by the effort to fill out the empty 
places in the retrieval cues, together with a 
general interpretation procedure. Note also 
that the problem-solving strategy for finding 
a property, “find superset; find one of its 
properties" is synthesized on the basis of 
the available network information; the strat- 
egy is one of the problem-solving methods 
derived by Selz (1922) from protocol analy- 
sis of similar tasks. The principle incorpo- 
rated in these programs seems of considerable 
generality: strategies to follow and problem- 
solving operations to perform are derived 
from the very information they have to proc- 
ess. The point has been elaborated by Elshout 
and Frijda (1966). Problem-solving methods 
may of course be learned as such; but they 
may in many cases be constructed ad hoc 
during processing, and they may be retained 
and reapplied, once they have been con- 
Structed in this manner. The memory systems 
described function, be it in a limited way, as 
program compilers. 

The same procedure of searching for a 
meaningfully matching inference rule, inci- 
dentally, may determine the selection. of 
meaning components for retrieval cue trans- 
formation. Earlier in this section we indicated 
that the collection of meaning components, 
brought up by interpretation, should be 
pruned. Only a few will be useful, namely, 


those which, because of inference rules, per- 
mit such cue transformation. The relevant 
inference rules, if they exist, are present 
among the meaning components of the rela- 
tion concerned. Only because “facet,” in the 
example “is, facet, (form, color, function, 
made of, . . .)," indicates substitution possi- 
bility, and because that relation is in fact 
interpreted, can the indicated cue transforma- 
tion occur. In the interpretation. process of 
the Frijda and Meerten program, all mean- 
ing components of the retrieval cue com- 
ponents are evoked; the relation of each of 
those is interpreted in their turn, to test for 
usefulness. Thought moves on only in those 
directions that feel promising. 

Even so, search may still be too extensive; 
in particular when data stores would grow to 
realistic proportions (Minsky, 1968). The 
capacity for interpretation and inference may 
lead the program into very extensive Te- 
cursions. These problems are largely unsolved 
in existing programs. Arbitrary limits on re 
cursions are usually set; this obviously is a 
quite unsatisfactory solution, and there exists 
a distinct need for further search-reducing 
heuristics. Something may be done with effort- 
evaluation procedures, first steps for which 
are undertaken by Colby et al. (1969). 

The fact that inference rules are embedded 
in the data structure, and that problem-solv- 
ing procedures are thus derived from that 
structure, embodies a point of great impor- 
tance, which should be strongly emphasized. 
The intelligence of the system resides entirely 
in the data network. In fact, the programmed 
procedures in the Frijda and Meerten pro- 
gram, for instance, are nearly trivial search 
routines and organizational procedures. All 
reasoning ability comes from the facts in the 
network, and from their “intelligent” rela- 
tions, which enable the program to solve 2 
variety of problems—verbal analogies a” 
similarities, for instance, besides straight- 
forward retrieval tasks or questions like, “Tn 
what country lives. somebody who lives in 
Amsterdam?” The point is of obvious PSY 
chological significance: it is not huma" 
thought methods that are powerful, but human 
knowledge. *Mental operations" are simple; 
but the content and organization of informa 
tion are the main roots of intelligence. 
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OurPUT CONSTRUCTION MECHANISMS 


Little can be said concerning the final phase 
of information retrieval, that of output con- 
struction. Word construction is a trivial, non- 
simulatory process in nearly all programs; 
only EPAM puts some effort in constructing 
its output syllables from bit-strings repre- 
senting parts of letters. Word finding or nam- 
ing, too, is usually trivial; none of the pro- 
grams would be capable of tip-of-the-tongue 
phenomena. In all programs a name is at- 
tached to each stored element (or no distinc- 
tion between names and stored information 
elements is made). However, names should 
have to be found, before output can be pro- 
duced, by a process that is identical to rec- 
ognition; or a name, connected to a trace, 
should be tested to check whether its meaning 
really yields a satisfactory match with the 
trace, or traces, concerned (Frijda, 1967). In 
such a manner would output processes become 
considerably more realistic, and, in particular, 
they would be capable of handling the loose 
relation between names and meanings also at 
the output side. . 

Sentence construction occurs Im several 
programs, either through fixed format filling 
(e.g., Colby, 1967; Colby & Enea, 1967; 
Raphael, 1968) or through the use of some 
form of generative grammar (e.g., Craig et 
al, 1966; Quillian, 1967). The construction 
of an equivalent of *mental images" has not 
been undertaken, to the author's knowledge; 
it may well be a worthwhile subject, since 
the constructive nature of mental images is 
fairly well agreed upon (e.g. Bartlett, 1932; 
Neisser, 1967; Piaget & Inhelder, 1966). Out- 
put construction would seem an important 
aspect of information processing generally, 
since outputs, particularly of tentative solu- 
tions, are usually fed back into the system, 
influencing its further processing in a poorly 


understood manner. 


FORGETTING 
description of a memory 
system, hardly any mention has been dude 
of forgetting. It obviously is not Tuas i 
of such a system; nor, anywhere in the con 
struction of a system with the required € 
bilities, has a need arisen for forgetting mech- 


In the preceding 


anisms. The question may therefore be asked 
whether such mechanisms should be incorpo- 
rated, since it is so striking a feature of hu- 
man memory activity, or how forgetting can 
otherwise be understood, in the context of the 
described models. 

In fact, some forgetting may appear with- 
out any special provisions because memory, 
in those models, is not a static structure. It is 
constantly modified because of the cogitations 
of the system. The majority of programs add 
the results of its information processing to 
the memory store. The transíormed and ac- 
ceptable ideas are inserted in the Abelson 
and Carroll and Colby programs; from then 
on they codetermine the computation of credi- 
bility and thus influence future events. The 
programs by Raphael and Frijda and Meer- 
tens, too, add the knowledge produced by 
deductive inference; if the same question is 
posed a second time, it will be answered 
quicker and by a different path. 

This state of change due to the arrival of 
new information does not always result in 
increased power. As has been mentioned be- 
fore, the growth of the EPAM's discrimina- 
lion net may result in a loss of previously 
correct associations. Forgetting of such dis- 
placed traces occurs: although the traces are 
there, they have become inaccessible (Feigen- 
baum, 1970; Feigenbaum & Simon, 1961). 

Forgetting can be obtained in still other 
ways. If the number of associations of a given 
network node increases and if associations are 
selected randomly, probability of evocation 
of each association decreases. Such forgetting 
will be more effective if strength measures 
influence that probability. Still another mech- 
anism for forgetting may be found in multi- 
ple connections between the nodes. If the 
system provides for propagation of activa- 
tion, as in the Reitman and Quillian pro- 
grams, and if the activation of a given node 
depends upon the number of activated links 
that end in that node, learning of new associ- 
ations between nodes may eclipse previous 
learning of other associations. 

To increase the dynamism of the system, 
the associations of a given node may be col- 
lected in a “push-down-stack,” which is one 
simple method to implement availability in- 
dexes and which has been used by Hintzman 
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(1968) in SAL III. Information is stacked in 
order of arrival; this order may be changed 
by usage, and only the topmost element is 
accessible; or the probability of evocation 
may be made proportional to the serial posi- 
tion in this stack. These mechanisms have 
been suggested by Bower (1967) in his theory 
of memory. They provide for nearly complete 
forgetting of old and little used information. 
Accessibility difference based upon order of 
arrival has, in fact, a certain plausibility. 
Beyond EPAM and SAL, only Reitman's 
program shows forgetting. There, it is assured 
by an activity parameter that decays over 
time. Obviously, this is a quite extrinsic ad- 
dition to the system that models a phenom- 
enon but not a mechanism. The interference 
processes just mentioned, which produce pro- 
active as well as retroactive inhibition, ap- 
pear more interesting. Whether they are suf- 
ficient to produce the kinds and degrees of 
forgetting that humans manifest remains to 
be seen; the degree would vitally depend upon 
the amount of information stored, and few 
experiments of the necessary kind, if any, 
have yet been performed. 


SUMMARY OVERVIEW AND EMPIRICAL 
EVALUATION 


In the preceding review, several programs 
have been discussed that process complex in- 
formation. Some of these had as their ex- 
plicit aim the simulation of human function- 
ing, or of some of its aspects: those of Lind- 
say, of Raphael, of Reitman, of Quillian, of 
Frijda and Meertens, EPAM and SAL, of 
Abelson and Carroll, and of Colby. 

Through this ensemble of programs, the 
outlines of a more or less coherent model of 
memory have become visible. This model con- 
sists of a large network of associations in 
which the nodes refer to the ideas, and the 
links to the relations between those ideas. 
The ideas may be simple or may themselves 
be composed of small networks that manifest 
complex and unorderly interconnections. The 
network should be considered as a system for 
the addressing of reconstruction of inputs, 
rather than as containing any sort of repro- 
ducible trace of those inputs. External stim- 
uli obtain contact with this network by way 
of a discrimination process that is governed 


by a discrimination net, or by parallel prop- 
erty matching; in this process, the stimuli are 
transformed so as to correspond with the in- 
ternal mode of information representation, 
a transformation which is produced in inter- 
action with the contents of the information 
store. 

The memory is used with the help of re- 
trieval strategies that trace a path in the net- 
work, guided by the retrieval cues that in- 
here in the tasks or aims, This memory uti- 
lization may be superimposed upon a spon- 
taneous activity which is modified by the 
search activity and which itself may modify 
that activity. The strategies are, at least in 
part, compiled on the basis of the memory 
itself and of the task or aim at hand. 

This model is mainly the outcome of con- 
structive efforts that have sought to establish 
a system manifesting capabilities that are es- 
sential in human memory. These efforts have 
been guided in part by psychological hypoth- 
eses, such as those of spontaneous activity; 
they have been guided mostly, however, by 
the demands put up by the goal—construction 
of a useful memory store—and by the kind of 
data involved: complex, structured, and inter- 
dependent information. The model does show 
large similarity, however, with recent psy- 
chological theories on human memory, such as 
those of Bower (1967), Kintsch (1970), 
Morton (1969, 1970), Norman (1968; Nor- 
man & Rumelhart, 1970) and Atkinson and 
Shiffrin (1968; Shiffrin, 1970; Shiffrin & 
Atkinson, 1969). The similarity is not co- 
incidental: most of these theories have been 
strongly influenced by the computer pro- 
gramming work discussed in this review. 

As far as empirical evidence is concerned, 
it can be said that the model accounts for à 
number of major phenomena of human mem- 
ory activity. Central to the model is the 
A,R,B structure as the basic mode of infor- 
mation representation. This hypothesis finds 
support in the increasing evidence concerning 
the importance of imagery, or, generally, of 
mediators in paired-associates learning (e.g. 
Adams & Montague, 1967; Bower, 1969; 
Ehri & Rohwer, 1969; Montague, Adams, & 
Kiess, 1966). More direct support comes 
from the evidence Bower (1967) has pro- 
duced for his multicomponent theory of the 
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memory trace, which embodies a similar hy- 
pothesis. 

The conception of a relational network fits 
in well with the work of Deese (1965) on 
association structures. It is in entire agree- 
ment with his general thesis that information 
utilization is based upon relations between 
meanings, and upon meaningful relations, 
rather than upon contiguity relations between 
words. Meaning is treated by Deese in essen- 
tially the same way as in the model, and as- 
sociative overlap follows immediately from 
proximity in the network. On the other hand, 
his laws of association could not have been 
predicted from the model. 

The hypothesis of a network with a near 
total lack of structural differentiation is 
plausible as a psychological theory; direct 
evidence would be hard to come by, however. 
There is even some support for the notion 
that the relations of concepts with classes and 
properties (rose-flower or rose-red) are rep- 
resented differently from those between inde- 
pendent concepts (rose-tulip or rose-nursery) 
(Kempen, 1970). 

Strong support is available for the aspect 
of the memory model represented by the 
EPAM model of recognition and acquisition. 
Reference has been made in the relevant sec- 
tions to the number of psychological phe- 
nomena reproduced by that program. Indeed, 
the quantitative correspondence of the simu- 
lation results with those of psychological ex- 
periments is on the whole quite satisfactory 
—as close or closer than those of other psy- 
chological theories (Simon & Feigenbaum, 
1964). The postulates of limited-capacity 
buffer storage and of unit-processing time 
(both needed to make the model work) both 
have considerable psychological plausibility, 
the latter giving a place in the model to 
consolidation processes. 

In addition, Hintzman (1967) has tested 
several hypotheses derived from his SAL adap- 
tation of EPAM in experiments designed for 
the purpose, and with largely positive results. 
On the other hand, important empirical objec- 
tions may be raised against EPAM as a model 
of recognition processes in general, andapan 
demonium model may be preferable (Neisser, 
1967, Chapter 3). 


Input transformation as a preparation for 
recognition and storage is a plausible hypothe- 
sis for psychology. Neisser (1967, Chapters 
3, 4, 5) reviewed the evidence in connection 
with figural and speech inputs. The evidence 
for the verbal loop hypothesis may be cited as 
additional support. Morton (1969) gave a 
detailed model in connection with word recog- 
nition. Interaction between input transforma- 
tion and the information already stored also 
may well be a major feature of human cogni- 
tion. Described under the name of “analysis- 
by-synthesis,” this interaction is deemed the 
essential aspect of speech perception by sev- 
eral authors (Liberman, Cooper, Shankweiler, 
& Studdert-Kennedy, 1967; Stevens & Halle, 
1967), although the degree of activity and 
cycling in this interaction is a matter of dis- 
pute (Morton, 1969, 1970). In the percep- 
tion of objects a similar interaction occurs, as 
appears mainly from the effects of context or 
frequency of previous experience upon coding 
and recognition thresholds (cf. Neisser, 1967). 
Jongman (1967) similarly invoked such inter- 
action to explain chess perception. 

Recall, in the model, is highly dependent 
upon retrieval cues and their nature. This 
central place of the retrieval cue in recall 
corresponds with its importance as demon- 
strated by Tulving and associates (Tulving, 
1971; Tulving & Osler, 1968; Tulving & 
Pearlstone, 1966). The essential difference be- 
tween recognition and recall, as emphasized 
by Kintsch (1970), parallels that between re- 
trieval with a literally matching cue and 
with an insufficient or incomplete and non- 
matching cue in the present study—the latter 
conditions requiring active search strategies. 

The importance given in the model to such 
information-retrieval strategies is in line with 
much recent experimentation and interpreta- 
tion (eg, Bower, 1971; Greeno, 1970; 
Kintsch, 1970; Meyer, 1970; Shiffrin, 1970). 
Effects of organization in presented lists are 
easily interpreted as results of more efficient 
search strategies (Bousfield, 1953- Tulving, 
1967). According to the observations of Bart- 
lett (1932), or to the protocols of thinking- 
aloud experiments (DeGroot, 1966; Frijda, 
1967; Reitman, 1970; Selz, 1922), human in- 
formation-retrieval activity is permeated with 
the application of complex information-retrie- 
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val strategies. Immediate production of knowl- 
edge in recall would seem an exception rather 
than the rule, even in relatively simple recall 
tasks, and it would seem the rule only in the 
simpler of recognition tasks. For Selz and 
Bartlett, in fact, the distinction between re- 
membering and problem solving is a gradual 
and imprecise one. 

Inference processes in particular appear to 
play an important part in recall; they would 
appear to merit much more emphasis than 
they commonly receive. Thinking-aloud pro- 
tocols (Selz, 1922) are the main sources for 
this opinion; but so is the effect of using class 
relations upon recall performance, as evident 
in category clustering (Bousfield, 1953; Un- 
derwood, 1964) or in recall facilitation pro- 
duced by categorizable lists (Cofer, 1966; 
Mandler, 1967; Tulving & Pearlstone, 1966). 
Other evidence comes from the experiments 
by Collins and Quillian (1969, 1970a, 1970b). 
The importance of inference processes in re- 
membering is underscored by some evidence 
in the development of memory (Piaget & 
Inhelder, 1968). In fact, the "spontaneous" 
improvement with age of specific recollec- 
tions that they describe is easily explained in 
terms of the model: elaboration of the mean- 
ing of the various relations will automatically 
improve recall. Piaget's schemata, inciden- 
tally, are obviously equivalent to the more 
logical of the relations as described above. 

The possibility to derive problem-solving 
methods from the data structure has some 
interesting psychological implications. Since 
the composition of the data store may deter- 
mine, to a large extent, which operations are 
performed at each point of the problem-solv- 
ing course, this course is highly dependent 
upon what the relevant data happen to be 
composed of. The variety of problem-solving 
procedures usually found with human sub- 
jects in one and the same problem-solving 
task (Elshout & Frijda, 1966), and the 
subtlety with which they appear to capitalize 
upon quite specific properties of the actual 
problem situation, thus become intelligible. 

This is about as far as the supporting evi- 
dence goes. For many aspects the psycho- 
logical evidence against which to test the 
model is lacking or, at least, fragmentary; 


such is the case, for instance, with the acqui- 
sition of complex factual information, or with 
the factors determining cue effectiveness in 
cued recall, or with the factors determining 
recall priorities at a given network node. Con- 
versely, the simulation model, as it is collec- 
tively embodied by a number of different 
programs, is as yet too general to permit 
many specific predictions to be made. Where 
such specific predictions are made, as in the 
work by Collins and Quillian (1969, 1970a, 
1970b), it is on the basis of additional and 
specific assumptions. In their case, the as- 
sumptions were derived from Quillian’s simu- 
lation model. It would seem that, on the 
whole, many of the specific assumptions in 
the general theories mentioned earlier in this 
section can be fitted into the general model 
sketched here. 

The value of at least some of the model's 
basic notions is attested by the success of sim- 
ulation: in the first. place, the effective pro- 
duction of some of the major phenomena and, 
in particular, the emergence of correct answers 
to factual questions in nontrivial ways. It 15 
true, however, that the significance of these 
successes should not be overestimated. The 
various  results—inferences, synthesis of 
strategies, effective problem solving, generali- 
zation, and interference phenomena, etc.— 
have been obtained in different programs. It 
is impossible to predict what will happen when 
someone undertakes to unite the principles of 
each program into one big one: it is unclear 
what problems this may engender and which 
incompatibilities may become manifest. More- 
over, all programs have, so far, operated only 
with very small information stocks; the larg- 
est contained 850 ideas (Quillian, 1967), and 
most programs employed considerably less- 
Their functioning may well collapse if larger 
stocks have to be processed. Search strategies 
may well prove completely insufficient Or; at 
least, considerably too slow to be either feast" 
ble or plausible; 


generalizing from current small-scale experiments 1o 
language processing systems based on dictionaries 
with thousands of entries . . . may entail a new 
order of complexity and require the invention aP 
development of entirely different approaches to 
semantic analysis and question answering [Sim- 
mons, 1970, p. 15]. 
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On the other hand, simulation work begins 
to manifest its usefulness as a stimulator of 
experimental research; moreover, the research 
that has been stimulated tends to support the 
model's notions. Reference has already been 
made to Hintzman’s (1967) experimental 
work. Collins and Quillian (1969) have found 
correlations between the relation of a concept 
and its properties in a hierarchy of classes 
and the reaction time for questions concerning 
those concepts; they also (Collins & Quillian, 
1970a, 1970b) tested other consequences of 
the use of class inference, in particular those 
of storage according to class hierarchies. 
Meyer (1970) further explored hypotheses 
concerning storage principles inspired by the 
same general orientation. Gilson and Abelson 
(1965) have examined a consequence of the 
model for the assessment of a statement's 
credibility, and other studies are performed in 
close correlation with the development of the 
program for belief manipulation (e.g., Abel- 
son & Kanouse, 1966). Finally, Bower 
(1970) performed experiments on the effect 
of organization in learning explicitly to ad- 
duce evidence for some of the aspects of 
storage structure. 

The model shows several important defi- 
ciencies. These deficiencies all concern the 
absence of properties, which, so far, have not 
been demanded by the requirements of task 
performance, but which happen to occur in 
human memory. Short-term memory has been 
neglected in all programs but EPAM (al- 
though separate simulation work on short-term 
memory has been done; e.g, J. Reitman, 
1970). If the systems were to operate in real 
time, however (as indeed EPAM is supposed 
to do), the need for buffer storage would im- 
mediately emerge. Forgetting in long-term 
memory is the second major omission; the pro- 
grams manifest little of it. Mechanisms for for- 
getting have been indicated, but have not yet 
been integrated into à _fact-retrieval pro- 
gram. Where some realistic degree of forget- 
ting is obtained (Hintzman, 1968), this is 
with the help of gratis suppositions such as 
that of a push-down stack. The same holds 
for everything in memory that is time depen- 
dent: no estimate of old or older can be 
made, or any consequence of it produced, ex- 


cept by means of an extra “time-line,” as in 
Reitman’s (1965) program. 

Realistic simulation of acquisition difficulty 
seems not easy to obtain, the existence of 
EPAM's mechanisms notwithstanding. As 
suggested earlier, in order to implement such 
mechanisms in a memory for complex infor- 
mation, that information may well have to 
be represented with a considerably finer 
grain than has been done so far; the ele- 
mentary notions themselves of the present 
programs should be represented by extensive 
networks. Addressing at the technical level, 
too, may well have to be profoundly modified 
before realistic simulation of these aspects is 
possible. 

At the level of understanding much is left 
to be done. For one thing, no program rejects 
a question such as, “is the idea green?” as 
meaningless. No program is meaningfully able 
to say no to an incorrect question of fact, 
such as, “is an apple square?" The entire 
manipulation of denial or mutual exclusion 
(“is the apple square?” “no, it is round") 
is, where it exists, as in Colby's or Black's 
programs, primitive and preprogrammed. Yet, 
even in these connections the model may 
appear useful, by bringing these matters 
sharper into focus as cardinal aspects of 
human information storage and information 
manipulation, and leading to promising ex- 
plorations such as those by Collins and 
Quillian (1970b), Meyer (1970), and others. 
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SPONTANEOUS REMISSION: 
FACT OR ARTIFACT? 


LEO SUBOTNIK ! 


Veterans Administration Hospital, St. Cloud, Minnesota 


The evidence for the hypothesis that sufferers from nonpsychotic emotional 
disturbance generally recover without professional psychotherapy is found to 
be vitiated by contaminating artifacts and unvalidated assessment procedures. 
Improvement has not been found to be a function of the passage of time. 


Reported remission rates have varied from 


37% to 78%, and no specific re- 


mission rate can be presumed as a comparison for the effects of treatment. 
Untreated patient groups in natural settings, because of selective factors, 
cannot be considered comparable to treated patient groups and cannot sub- 
stitute for a controlled research design. Any adequate analysis of the course of 
treated and untreated psychological difficulties must take account of the 
fluctuation hypothesis, that is, cyclical manifestations of severity arising 


from exogenous or endogenous factors 


In psychotherapy, a field of clinical psy- 
chology where there is so much controversy, 
we should, perhaps, be grateful for the few 
beacons of established fact on which re- 
searchers have been able to reach a consensus. 
One such fact appears to be that emotionally 
disturbed persons improve in the course of 
time without benefit of psychotherapy. Truax 
(1967), for example, wrote that “ ‘spontan- 
eous remission’ is a well-established fact in 
most patient populations [p. 158]": Bergin 
(1967) told us that “Tt has been frequently 
replicated, and is now a well-established fact, 
that control Ss [subjects] who do not receive 
psychotherapy change positively as a group 
with the passage of time [p. 140]." 

We also know what the spontaneous remis- 
sion rate is. Eysenck (1952, 1961) and 
Levitt (1957b) cited several studies to show 
that this rate is about 70% with adults and 
children (Denker, 1947; Landis, 1937; Lehr- 
man, Sirluck, Black, & Glick, 1949: Shep- 
herd & Gruenberg, 1957; Witmer & Keller, 
1942). Subsequent studies (Schorer, Low- 

1 The writer gratefully acknowledges his indebted- 
ness to Marjorie Gerads, Medical Librarian at the 
St. Cloud Veterans Administration Hospital, whose 
assistance was indispensable in the preparation of 
this review. He also thanks Donald Bamber, William 
Klett, and Charles Watson, of the St. Cloud Veterans 
Administration Hospital Psychology Service, for 
their critical readings of the manuscript. 
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inger, Sullivan, & Hartlaub, 1968; Wallace 
& Whyte, 1959) have supported this figure. 

Since the various reports available of the 
success rates of traditional (insight) psycho- 
therapies generally do not equal or exceed 
this figure (Eysenck, 1952, 1961; Landis 
1937; Levitt, 1957b) and since controlled 
studies have failed to demonstrate convinc 
ingly any gains resulting from psychotherapy» 
it seems clear that traditional psychothera- 
pies are completely ineffectual. 

Now that Eysenck (1967) has cited a new, 
large-scale study (Cremerius, 1962) of the 
results of traditional psychotherapy, We are 
in a position to move the whole argument g 
step further. Since this 10-year follow-up ° 
600 cases shows only 25% achieving 2? 
maintaining improvement, we must now 
alerted to the implication that the tradition? 
psychotherapies are not only expensive 
are doing immense harm, since they are pre 
venting 45% of the patients from otherwise 
improving! 3 

As a consequence of the discovery of fhis 
general phenomenon of spontaneous remis 
sion, a knotty theoretical problem, eloquent 
delineated by Mowrer (1950) as the i4 
rotic paradox," the problem of why gel f-de 
feating neurotic behavior should persist, 54% 
almost vanished. There may remain instances 
that require explanation, but apparently Ys 
were mistaken in believing that neurosis was 
resistant to extinction. 
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A few voices have been raised along the 
Way io resist the rising tide of argument and 
evidence (Cartwright, 1955;  DeCharms, 
Levy, & Wertheimer, 1954; Luborsky, 1954; 
Rosenzweig, 1954). The most recent of these 
(Kiesler, 1966) seems curiously quixotic: 
Kiesler expressed the remarkable hope that 
his “refutation will bury [the spontaneous re- 
mission ‘myth’] permanently [p. 114].” 

It may thus seem superfluous to tender 
another evaluation of the argument and evi- 
dence at this late date. A striking circum- 
Stance, however, tempts one to do so; the 
phenomenon of spontaneous remission, so well 
established now in the thinking of most psy- 
Chologists, has been almost entirely re- 
searched by nonpsychologists without benefit 
of sophisticated research designs or sophisti- 
cated statistical analyses. Further, Eysenck 
and Levitt, in reviewing the evidence, failed 
to consider problems of comparability, grind- 
ing disparate reports and surveys into a sta- 
tistical sausage, concealing more than it re- 
veals. Eysenck averaged surveys of treatment 
outcomes ranging from 41% to 77% "im- 
provement"; Levitt's range was even greater 
(43% to 97% at follow-up). Both Eysenck 
and Levitt rested their "spontaneous remis- 
sion” base rate (709%) on two reports apiece 
with dubious comparability to each other or 
to the reports of treatment. 

In the discussion to follow, the spontane- 
ous remission hypothesis is understood to 
postulate that there is a general tendency of 
persons to recover from nonpsychotic emo- 
tional disturbances with the passage of time 
without the intervention of a professional psy- 
chotherapist. The benign processes, whatever 
they may be, should be more evident the 
more opportunity they have had to manifest 
themselves. The rate of improvement of un- 
treated patients would be expected to be a 
“monotonic function of time [Eysenck, 1961, 

» 
P feinen unspecified sufferers under some 
unspecified circumstances improve or a 
without such intervention is not in dispu e: 
The issue is whether spontaneous remission 
is common or rare. The causes of such re- 
mission are also not at issue because they are 
not specified by the hypothesis; possible 
causes include a learning or extinction proc- 


ess, the cessation of external stress, or the 
obtaining of “informal psychotherapy” Shom 
nonprofessionals such as family, friends, 
teachers, or ministers. If there is a general 
tendency toward recovery because of benign 
factors that come into play with the passage 
of time, then the necessity or benefit of pro- 
fessional psychotherapy is, of course, in ques- 
tion. 

Two different questions, which have been 
confounded since Eysenck’s first statement on 
this issue, must be distinguished. The ques- 
tion of whether a general phenomenon of 
spontaneous remission occurs in emotional 
disturbances is not the same as whether psy- 
chotherapy produces discernible favorable ef- 
fects that are greater than any spontaneous 
improvement, though the questions are re- 
lated. With respect to the first question, our 
interest lies in whether the evidence adduced 
to affirm the existence of spontaneous remis- 
sion is convincing or attributable to artifacts. 
The second question raises the problem of 
whether, artifacts or not, the untreated pa- 
tient groups that have been studied can serve 
as control groups against which to test the 
effects of psychotherapy. This distinction 
should be borne in mind even though, in the 
discussion to follow, it will sometimes be 
convenient to consider the two questions to- 
gether in reviewing certain of the studies. 

The present argument first reviews the 
studies upon which Eysenck initially predi- 
cated his assertion of the spontaneous remis- 
sion base rate. These studies of Eysenck’s 
cannot be compared to studies of improve- 
ment in psychotherapy, since neither patient 
characteristics nor improvement criteria are 
comparable to the psychotherapy situation, 
Some data are considered, suggesting that 
psychotherapy candidates are often persons 
who have jailed to recover “spontaneously.” 

Studies reporting spontaneous remission 
since Eysenck’s first statement are noted to 
vary considerably in the rates reported and 
to rely on unvalidated clinical judgments with 
unreported reliability. The failure to find a 
relationship between remission and the pas- 
sage of time is pointed out. 

The use of these later reports, which have 
studied untreated patients from clinic waiting 
lists, as rough controls for treated patients is 
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also questioned, since the patients may have 
“defected” from treatment because they 
thought they were improving. 

The widespread impression that control 
groups from psychotherapy studies have 
shown spontaneous remission is then reexam- 
ined and found to have little substantiation. 

Finally, the hypothesis is offered that psy- 
chological disabilities tend to fluctuate in 
severity rather than disappearing, a phenom- 
enon that confounds both the attempts to 
evaluate spontaneous remission and to eval- 
£ uate the effects of psychotherapy. 


Eysenck’s Basr RATE 


Eysenck’s (1952) initial statement of the 
spontaneous remission hypothesis, as Kiesler 
(1966) has pointed out, rested wholly on two 
reports of uncontrolled and minimal survey 
data (Denker, 1947; Landis, 1937). The 
Denker report was a tabulation of disability 
benefits paid to patients of general practi- 
tioners, who considered the illnesses to be 
psychoneurotic but treated the patients them- 
selves rather than referring them to special- 
ists. Over the course of time, most of these in- 
surance claims were discontinued, 72% with- 
in 2 years. The Landis study reported that 
two-thirds of patients diagnosed psychoneu- 
rotic were discharged as recovered or im- 
proved within a year from New York State 
hospitals (in 1914) and from United States 
hospitals (in 1933). (Eysenck’s hypotkesis 
refers to the course of nonpsychotic emotional 
disorders, the prime target of psychoanalytic 
and other traditional psychotherapies.) 

Criticisms of Eysenck’s use of these studies 
may be grouped into three main categories: 
(a) the criteria of improvement are not 
clearly meaningful or comparable to those 
used by psychotherapists; (b) the patients 
are not comparable to those seen by psycho- 
therapists; (c) the patients may have re- 
ceived informal “psychotherapy”? in some 
sense. The latter type of objection offers little 
solace to those who wish to vindicate the con- 
tribution of the professional psychotherapist, 
but the other objections are cogent. 

The untrustworthiness of the criteria for 
improvement has been amply noted. The use- 
fulness of the insurance claims criterion was 
negated by Cartwright (1955), who showed 
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that the estimated period of the survey 
(1933-1944) was coincident with the emer- 
gence of the United States from the great eco- 
nomic depression of the 1930s; the discon- 
tinuance of the claims may simply have re- 
flected the increasing employment opportuni- 
ties. As for Landis data on the discharges 
from mental hospitals, such discharges are 
often more contingent on social factors in the 
patients environment (a receptive family, 
financial circumstances, community tolerance, 
a job waiting) than on his psychological 
status. The very diagnosis of psychoneurosis 
in itself implies an intention to discharge. The 
criterion of discharge is not directly compa- 
rable to the criteria for improvement used by 
psychotherapists, since hospitalized patients 
who seek psychotherapists will often see 
them after discharge. The meaningfulness of 
the Denker and Landis criteria is also brought 
into question by the implication that Landis 
hospitalized neurotics “recovered” more rap- 
idly than Denker’s outpatients, who were 
presumably less ill! 

The fact that the patients in these pur 
studies are not necessarily like those seen ig 
psychotherapists, and thus cannot provide : 
base rate for recovery without psychotherapy: 
seems equally clear. Hospitalized mental P4 
tients can hardly serve as a comparison group 
for psychotherapy outpatients. The owk 
tients of the Denker report were not a gu 
whom psychotherapists treat, since they Le 
persons who neither chose nor were reie 
for psychotherapy. Did the physicians 
lieve they did not need it? | the 

Most psychologists who have aenep T age 
fact of spontaneous remission appear to k's 
done so primarily on the basis of Eysen d 
use of these two studies. Other pertinent stu ye 
ies since Eysenck’s original statement ha 
not been widely noted. 


S a 61 
In later statements by Eysenck pet 
1965, 1966) an additional study 15 upon 


(Shepherd & Gruenberg, 1957), which 
examination seems more confusing than M 
fying. The authors gave data on neuro E 
gathered from the Health Insurance Pian jo? 
Greater New York. They took the propor. ed 
of enrollees at any given age who vege 51 
service for neurosis for the first time in jot 
and who had no such benefit for the 3 


f; 
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Fic. 1. For enrollees in Health Insurance Program, average annual prevalence 
in 1948-1951 and incidence of “new cases” in 1951 of psychoneuroses for 
which services were given. (Reprinted with permission from an article by 
Michael Shepherd and E. M. Gruenberg published in The Milbank Memorial 
Fund Quarterly, 1957, Vol. 35. Copyright by the Milbank Memorial Fund, 
1957. The prevalence curves reflect the average annual experience over the 
years 1948-1951 of all Health Insurance Plan enrollees with 12 months of 
coverage in any one of those calendar years. There was a total of 60,302 
person-years of exposure over this period; 2,714 of these person-years were 
characterized by the existence of one or more services related to mental illness, 

The new case curves show the experience in 1951 of 6,643 enrollees who had 
entered the Health Insurance Plan by January 15, 1948, were still in the plan 
on December 31, 1951, and had not received service related to psychoneurosis 
in 1948, 1949, or 1950.) 


(see Figure 1), but rather that “the preva- 
lence curve . . . is only slightly higher than 
the incidence curve, it follows that the aver- 
age duration of these illnesses is between one 
and two years [pp. 263-264]." It may be 
noted from their graph, however, that their 


years to represent the "incidence." They com- 
pared this with the average proportion at that 
age serviced in the years 1948-1951 (the 
*prevalence"). Their essential point appears 
to be that if the illnesses are lingering, the 
claims would accumulate progressively with 
the rise in age much beyond the number an- data are so unstable that females up to the age 
ticipated from the average incidence of new of 25 years are shown to have a higher inci- 
cases. They argued that since the curves plot- dence of new cases in 1951 than their average 
) ted from their data showed no such pattern total cases for the 1948-1951 period. In other 
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words, there were more new cases in 1951 
than the estimated £ota] number based on 
prior years. 

No data are presented as to how these 
cases were treated, whether by psychotherapy 
or other means. The authors seem to refer to 
the cases as "untreated": "From these data 
it is perfectly clear [sic] that, in the mass, 
neuroses must have a limited course even if 
untreated . .. [Shepherd & Gruenberg, 1957, 
p. 264]." In any case, the discontinuance of 
service under the Health Insurance Plan can 
hardly be taken as termination of the illness, 
Since patients may discontinue with equal 
reason because they are not benefiting or 
because they have recovered. By such cri- 
teria all patients recover, and all treatments 
are 100% effective. 

The reports adduced by Eysenck to es- 
tablish a spontaneous remission base rate are 
thus quite useless as a basis for comparison 
with results of psychotherapy. Their criteria 
for improvement are ambiguous and not 
comparable to those used by psychotherapists. 
Also, the patients are such as rarely reach a 
psychotherapist, especially a psychoanalyst. 

Among the innumerable selective factors 
that becloud the search for a remission base 
rate, a crucial one is the possibility that 
those who seek psychotherapy have already 
exhausted the hope of spontaneous remission 
or palliative measures. The experience of 
many psychotherapists is that patients, be- 
cause of the social stigma, the anticipated 
duration or expense of treatment, and other 
factors, do not seek them out lightly. They 
may have suffered for years, sought physical 
causes and remedies, and come to the psycho- 
therapist as a last resort. This selective factor 
will vary widely from one setting to another, 
of course. On a college campus where the 
service is free and the social stigma is mini- 
mal, a greater proportion of transient diffi- 
culties may be expected. In a veterans ad- 
ministration mental hygiene clinic restricted 
to serving veterans with “service-connected” 
psychiatric disabilities, a transient disorder 
may never be seen. An expensive psycho- 
analyst offering long-term treatment is un- 
likely to be dealing with transient difficulties, 

Some data at hand suggest that this varia- 
ble of prior duration requires attention. A 


study of obsessional states (Pollitt, 1960) 
found that the average period between onset 
of the main illness and seeking treatment was 
7.5 years. Half were seen within 2 years of 
the onset of symptoms and nearly a quarter 
after 10 years. A report from a psychiatric 
clinic in Edinburgh (Giel, Knox, & Carstairs, 
1964) found that 67% of the men and 45% 
of the women at intake had had previous 
psychiatric illness; only 28% of both sexes 
had had their symptoms less than 3 months. 
Children seen at a municipal clinic in the 
1920s had histories of behavior difficulties 
dating back an average of 6 years for boys, 
5 years for girls (Robins, 1966, p. 154). An- 
other clinic (Masterson, Tucker, & Berk, 
1963) judged that most of the children with 
thinking disorders (68.7%) and acting-out 
behavior (70.695) had manifested the illness 
for more than 4 years. Only girls with hys- 
teria were usually (66.69%) seen less than 2 
years after the onset of their difficulties. 


"Pusmn-OvT" Patients AS BAsE RATE 
Groups 


Several studies reported since those cited 
by Eysenck have appeared to solve the prob- 
lem of finding untreated patient groups com- 
parable to those treated by psychotherapists- 
They have focused on patients who applied to 
a clinic for treatment and were placed on 4 
waiting list but were never treated, either be- 
cause the clinic never followed up with an 
offer or because the patients were no longer 
interested when therapy became available. 
These patients at least appeared to be in the 
same pool from which some psychotherapists 
draw their patients. A serious question Te- 
mains, however, which is discussed later, 35 
to whether these “push-outs” are indee 
comparable to those patients who receive 
treatment. 

These studies all employed follow-up con- 
tacts and relied on clinical judgments of im 
provement. 


Studies of Adults 


Considerable variation has been found 1" 
the rate of improvement of untreated groups 
Wallace and Whyte (1959) followed up P% 
tients who had applied for treatment at ? 
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Clinic, were placed on a waiting list, but were 
never offered treatment. They were able to 
evaluate 49 patients, mostly diagnosed neu- 
rotics, who had been on the waiting list from 
3 to 7 years. The authors did not report that 
any patients had sought treatment elsewhere. 
In the judgment of the interviewers, 65.3% of 
these patients—approximately the remission 
ligure offered by Eysenck—had “improved.” 

Two other studies, however, obtained re- 
mission rates considerably at variance with 
Eysenck's. 

Saslow and Peters (1956) followed up 83 
patients, who applied to a clinic for treat- 
ment, received one or two evaluation inter- 
views, and were placed on a waiting list but 
Were never treated. The follow-up period 
ranged from 1 year and 4 months to 6 vears 
and 8 months, but for 80% of the patients 
the lapse was between 4 and 6 years. Diag- 
noses ranged from neurosis to psychosis and 
mental deficiency, but 9095 were called neu- 
rotics. The patients were evaluated as to im- 
provement by the follow-up interviewers and 
were also asked to evaluate themselves. Only 
37% of the patients were judged by the 
interviewers to be improved, and only 47% 
of the patients judged themselves to be im- 
proved. 

Endicott and Endicott (1963) reported on 
a group who were kept waiting 6 months for 
psychotherapy as part of a research design. 
They were rated on a number of scales and 
given a battery of psychological tests (Ror- 
schach, MMPI, TAT, and Draw-A-Person), 
but only the results of an *Evaluation of Im- 
provement Scale" rated by the senior author 
were reported. Of the 40 subjects, 40% were 
considered “improved” and the rest *unim- 
proved." 

The variation of these remission figures, 
based on judgments of improvement, renders 
dubious the acceptance of any general base 
rate at this point. These differences may be 
due to differences in patient characteristics, 
to differences in standards for improvement 
from one study to another, or to the unreli- 
ability of clinical judgments. Whatever the 
Sources of the discrepancies, we cannot pre- 

_ sume the remission base rate to be known 
lor any given group of patients. 
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We have some evidence from groups of 
treated patients that diagnosis or symptoma- 
tology are likely to make a considerable dif- 
ference in prognosis. For example, about 6095 
of a group of patients with obsessional states 
were reported to be recovered or much im- 
proved after a mean follow-up of 3} years 
(Pollitt, 1960), but only 21% of phobic 
patients improved after a mean of 23 years 
(Errera & Coleman, 1963). Hastings (1958) 
reported remission rates after 6 to 12 years 
for a variety of diagnostic categories. His 
rates for neurotics varied from 72% for anxi- 
ety states and 70% for reactive depressions 
to 53% for obsessive compulsives, 5196 for 
hypochondriacs, and 5 1% for mixed neuroses, 
A 30-year follow-up of child patients (Rob- 
ins, 1966) assigned long-term diagnoses based 
on information about their adult years. Sev- 
enty-five percent of manic depressives were 
considered improved but only 39% of socio- 
paths, 21% of neurotic depressions and un- 
diagnosed neurotics, 19% of the anxiety neu- 
Toses, and 20% of the hysterics. None of the 
hysterics (V = 20) were considered fully re- 
covered (Robins, 1966, p. 223). Differential 
treatment effects (for those conceding the 
possibility of treatment effects) may explain 
some of this variability, but the possible dif- 
ferences inherent in the syndromes are ob- 
trusive. The discrepancies from one study to 
another for similar diagnoses also raise again 
the issues of comparability of standards 
among such reports and of the reliability of 
clinical judgments. 


Studies of Children 


The problem of evaluating spontaneous re- 
mission of childhood disturbances is more 
complex than with adult ones. In contrast to 
adult neurosis, both popular belief and clin- 
ical lore hold that the child may often “grow 
out of” his particular difficulties or symp- 
toms, that they represent difficulties in man- 
aging the shifting demands and stresses of 
development that may often be resolved suc- 
cessfully without professional intervention. 
Maturity is generally expected to bring 
greater capacity for control and modulation, 
greater tolerance for frustration, and better 
skills for managing the environment so as to 
obtain satisfactions. Consequently, we might 
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expect a considerably higher remission rate 
than for adults. It is most astonishing, then, 
to be informed by Levitt (1957b) that the 
spontaneous remission rate for children is al- 
most identical with Eysenck's figure for adults 
(Eysenck: 72%; Levitt: 72.5%). 

Further, there are special difficulties in 
evaluating the changing status of disturbances 
in children, since symptoms may shift consid- 
erably with development to become more age 
appropriate. A wildly aggressive 4-year-old 
boy may be a tense and overcontrolled one 
at 10 years. Truanting and petty thievery in 
childhood may herald later alcoholism (Rob- 
ins, 1966). Failure to recognize such shifts 
may spuriously inflate estimates of improve- 
ment or contribute to unreliability of judg- 
ments. 

Levitt (1957b) based his estimate of the 
spontaneous remission rate for children on 
patient groups, for whom parents had sought 
treatment at a clinic and been put on a wait- 
ing list, but whose parents refused services 
when eventually contacted. The two studies, 
on which his estimate is based, were different 
in important respects. One, a 1-year follow-up 
(Lehrman et al., 1949), reported 70% im- 
proved of a control group of children accepted 
for treatment at a clinic but withdrawn by 
their parents. The developmental factors, 
alluded to earlier, complicate the other (Wit- 
mer & Keller, 1942) because of the long 
follow-up period, 8 to 13 years. By the time 
of reevaluation, 76% were no longer children 
but were 18 years or older. This study re- 
ported 78% improved. The report does not 
make clear why these children were diag- 
nosed only and not treated. A third study 
(Morris & Soroker, 1953) was rejected by 
Levitt for the purpose of estimating the spon- 
taneous remission rate because many of the 
presenting problems appeared quite minor. 
Nevertheless, the proportion of improved or 
recovered is consistent with the others—78% 
at a follow-up evaluation from several weeks 
to 6 months later. We know many of the 
problems were minor only because the authors 
went into some detail; we do not know that 
the other studies were different in this re- 
spect. Morris and Soroker reported from a 
typical urban clinic. The circumstances in 
this study resembled those for the adult 


*push-outs," that is, the patients were placed 
on a waiting list (for several weeks to 6 
months), and the parents were not contacted 
further or declined treatment when later 
contacted. 


Problems of Validity 


What confidence can we have that the pro- 
portions of “improved” reported for these 
patient groups, accepted but untreated by 
clinics, signify a valid process of spontaneous 
remission? 

All of these studies depend on two highly 
fallible sources of data, that is, interviews 
quantified by ratings or global clinical judg- 
ments. Although there is some evidence that 
such judgments can be reliable (although 
they often are not) and that they can have 
validity (though there is considerable varia- 
bility on this dimension among judges; Holt, 
1965; Matarazzo, 1965), any given study 
resting on such data must establish its own 
standing on these fundamentals. None of 
these studies offers any evidence to support 
the presumption of validity or even reports 
data on the reliability of the judgments. We 
have no way of estimating to what extent we 
are simply dealing with error. 

If the amount of judged improvement were 
shown to increase with the passage of time; 
there would be some assurance that the judg- 
ments were not totally random and might be 
construed to support the existence of a re- 
mission phenomenon. As is noted in the next 
section, however, the studies that take note 
of the time factor report no such relationship. 

If we were to assume that the judgments 
were completely at random, we would arrive 
at a statistic for the remission rate very close 
to that proposed by Eysenck and Levitt. In 
several of these studies, patients are sorte 
into three categories: much improved, some 
what improved, and unimproved or worse. On 
a completely random basis, utilizing the cate- 
gories equally, two-thirds, of course, would be 
classified in the two improved or remission 
categories. Even without assuming such data 
to be precarious in this extreme, Windle 
(1962, pp. 139-142) offered an argument, 
based on an assumed reliability of .57, tha- 
a statistical regression effect deriving from 
misclassification would produce remission fig” 
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ures closely corresponding to those that have 
been reported. In any event, in the forced 
judgment conditions of most of the studies, 
the categories for improvement will be as- 
signed some proportion, valid or not. 

The misleading effect conveyed by figures 
offered in the absence of some estimate of 
error may perhaps be seen more clearly in a 
concrete example offered by this writer.2 No 
significant difference emerged between test 
and retest scores on the Cornell Medical In- 
dex Health Questionnaire of 59 patients in a 
general medical practice, identified by initial 
scores on this measure as emotionally dis- 
turbed but receiving no professional psycho- 
therapy. However, a simple tally of those 
whose follow-up scores changed in the direc- 
tion of improvement counted 61% "im- 
proved." Even by a more rigorous criterion, 
a favorable shift of 5 points or better, 50% 
“improved.” The fact that there was no reli- 
able shift of the total group would have been 
quite obscured by the simple frequency count 
typically employed. . 

To the extent that the judgments of im- 
provement depend on the reports of the 
adult patients themselves or the parents of 
the child patients, they are vulnerable to dis- 
tortions produced by shifts in the purposes of 
the informants. All of the studies of untreated 
waiting-list patients depended upon such in- 
formants with the partial exceptions of Endi- 
cott and Endicott (1963), which included 
psychological tests, and the studies of chil- 
dren by Witmer and Keller (1942) and 
Lehrman et al. (1949), which sought to ob- 
tain corroborating information where applica- 
ble from the referring social agency or from 
court and other public records. 

It is to be expected that applicants for 
treatment will selectively emphasize . their 
symptoms and difficulties in their initial 
presentation, especially if they are trying to 
press upon the clinic the urgency of their case. 
A quite different situation obtains at the 
follow-up interview after they have been put 
off for so long—rejected for practical pur- 
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poses. The process of reconciling the cogni- 
tive dissonance between having asked for 
help and since doing without help will be a 
confounding factor at the later point. Denial, 
“sweet onions” rationalizations, and accom- 
modation to the inconvenience of the symp- 
toms can be expected to influence the re- 
sponses of the interviewees. These factors may 
convey an impression of improvement with- 
out any valid change in the seriousness of the 
psychological difficulties. Similar artifacts are 
implicit, of course, in the evaluation of the 
effects of treatment. 

A follow-up study of untreated patients 
that does not resolve the problems of the va- 
lidity of patients? reports and of clinical rat- 
ings but shifts the focus of inquiry somewhat 
by providing a comparison group is that by 
Schorer et al. (1968). In following up a 
clinic waiting list 2-8 years later of persons 
accepted for psychotherapy but untreated, 
they found that some had obtained treatment 
elsewhere. They compared 55 patients who 
remained untreated with 41 who had ob- 
tained treatment elsewhere to determine 
whether the treated group were benefited. No 
significant difference was found in the propor- 
tion improved—65% of the untreated and 
78% of the treated. As for degree of improve- 
ment, the treated group actually showed less 
amelioration of symptoms than the untreated 
group. 

The reported improvement figures for both 
groups are subject to the same validity prob- 
lems noted earlier, namely, defense mecha- 
nisms of the informants and the unreliability 
of clinical judgments. Schorer et al, in 
spite of their title (“Improvement Without 
Treatment"), did not address themselves to 
the problem of establishing the validity of 
the improvement but rather to testing the 
efficacy of therapy. Strategies to evaluate 
psychotherapy may not have to dispel the 
doubts about artifacts, though it would be 
helpful, as long as the sources of error are 
equated in the treated and control groups. 
The question of whether accepted but un- 
treated patients from a waiting list can serve 
as adequate controls for a treated group, that 
is, whether all other factors are equal but 
treatment, is considered in a subsequent sec- 


tion. 
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Factor oy TIME 


At this point, however, notice must be 
taken of the finding by Schorer et al. that 
the proportion of cases improved did not in- 
crease with the passage of time. The spon- 
taneous remission hypothesis implies that the 
passage of time permits benign processes, 
whatever they may be, to have their effect, 
which should be more evident the more op- 
portunity they have had to manifest them- 
selves. The rate of improvement of untreated 
patients would be expected to be a “mono- 
tonic function of time [Eysenck, 1961, p. 
707]." That is the reason for Eysenck’s re- 
jection of the waiting period as control in 
the Rogers and Dymond (1954) study; that 
is, the time periods for control and treatment 
were unequal. If no effect of time is found, 
other explanations of the reported improve- 
ment rates, such as the artifact hypothesis, 
are more tenable. 

No study of untreated patients, except for 
the Denker (1947) study of insurance claims, 
which has been discredited as evidence on 
this point, has shown an effect of time on the 
improvement rate. Wallace and Whyte 
(1959), like Schorer et al. (1968), also found 
no such effect; they concluded that the im- 
provement must have entirely taken place 
within 3 years, their minimum follow-up time. 
Other reports have not attended to this issue. 

This writer’s aforementioned study (see 
Footnote 2), however, was explicitly ad- 
dressed to this factor, He reported test-re- 
test data on the Cornell Medical Index Health 
Questionnaire for 59 patients of general prac- 
titioners who had initially scored above a 
cutting point indicating emotional disturbance 
but had received no professional psychother- 
apy (ie, by a psychologist, psychiatrist, or 
psychiatric social worker). He compared five 
groups of subjects classed according to the 
amount of time elapsed between test and 
retest, ranging from less than 1 year to more 
than ^ years, and applied an analysis of 
Variance, There were no reliable differences 
asa function of time such as the Spontaneous 
Temission hypothesis would predict. 

Data from a similar study of college stu- 
dents, based on three groups of untreated stu- 
dents who were judged by clinicians to be 


"disturbed" on the basis of their initial 
MMPISs, also suggest no relationship between 
apparent improvement and the amount of 
time elapsed before the follow-up MMPI 
(Subotnik, in press). 


Warrixc-Lisr PATIENTS as CONTROLS 


Even though the reported improvement 
rates of patients accepted for therapy but 
placed on a waiting list have questionable 
value as indicators of remission, such groups 
would, on first consideration, seem to serve as 
acceptable comparison groups for the evalua- 
tion of treated patients, since artifacts would 
presumably be equated. They have the im- 
mense advantage over those groups cited by 
Eysenck in that they have at least applied for 
and been accepted for psychotherapy, thus 
reducing the possibilities of selective factors 
distinguishing them from the therapy patient 
groups. Although some studies have simply 
reported on the untreated groups in isolation, 
several have attempted to make direct com- 
parisons with treated patients at the same 
clinics and with the same raters and evalua- 
tion procedures (Lehrman et al, 1949; Lev- 
itt, Beiser, & Robertson, 1959; Schorer et al., 
1968). 

Levitt (1957a, 1957b, 1958a, 1963) has 
vigorously argued for the comparability 
of these groups. He compared “defectors 
from a child guidance clinic with “remain- 
ers,” who stayed for at least 20 treatment 
interviews, on 61 variables, including facts 
about history and symptoms and judgments 
of severity and prognosis, and concluded that 
the two groups did not differ (Levitt, 1957a). 
In another report (Levitt, 1958a), he found 
that mental health professionals could not, 07 
the basis of the clinic records, detect a differ- 
ence in severity of symptoms between the two 
groups. On the other hand, differences have 
been found at other clinics between the two 
groups by Witmer and Keller (1942), Lehr- 
màn et al. (1949), and Ross and Lacey 
(1961) in symptoms and parental character- 
istics and by Lake and Levinger (1960) 1n 
socioeconomic status. 

Whether or not the various differences 
und are reliable or of practical significance; 
a crucial question remains unsettled: Do pa- 


tients fail to follow through on their treat- 
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ment or fail to seek treatment elsewhere after 
a waiting period because they are, or believe 
they are, improving? If that is the reason, 
then the patients who do pursue treatment 
are less likely to be those suffering from 
transient difficulties. In a telephone follow-up 
of attrition in a child guidance center, Inman 
(1956) was told by 35% of the respondents 
that the child had improved. In another study 
at the same clinic (Levitt, 1958b), 16% gave 
improvement or lack of seriousness of the 
child’s difficulties as the prime reason for 
breaking off (the number for whom this might 
have been a secondary reason or rationaliza- 
tion was not ascertained). For adults, 
Schorer et al. (1968) found that the patients 
on the waiting list who sought treatment else- 
where did not differ on most variables from 
those who did not except for three items on 
which they were less favored: percentage of 
last year employed, occupational adjustment, 
and thinking disorder. 

We are still in the dark, anyway, about 
what indicators might be prognostic of remis- 
sion. Schorer et al. (1968) found only three 
variables that distinguish, only to a weak 
degree, the untreated patients who improved 
from the untreated patients who did not: the 
ones who improved were more frequently mar- 
ried, more frequently living with their 
spouses, and scored higher on a rating of oc- 
cupation. In her 30-year follow-up of child 
guidance patients, Robins (1966) reported 
that except for the father's probable psychi- 
atric diagnosis, “other aspects of the child- 
hood environment, the kind, severity, and 
number of childhood symptoms, and child- 
hood experiences with the law were all unre- 
lated to eventual remission [p. 233]." Not 
even the father's diagnosis was predictive of 
improvement short of remission. f 

It thus appears difficult, and perhaps im- 
possible, to use attrition groups of patients for 
evaluating the effects of therapy on treated 
groups because of selective factors, including 
the one at issue, namely, a tendency toward 
real or imagined remission without treatment. 


CONTROL Groups AND REMISSION 


Bergin’s (1967) remark, cited at the = 
set of this study, asserts that controlled stud- 
ies of psychotherapy have regularly shown 


that the untreated groups change positively 
with the passage of time. 

The demonstration of improvement with 
the passage of time requires repeated mea- 
surements or evaluations in order to take ac- 
count of regression artifacts confounding the 
simple assessment-reassessment procedure. If 
subjects are chosen for treatment on the basis 
of deviant scores on some measure or mea- 
sures, the statistical phenomenon of regres- 
sion toward the mean could be expected to 
manifest itself. (If initial scores on the mea- 
sures are not deviant, it is not meaningful to 
expect “improvement” in terms of them.) A 
valid remission process, however, would be 
expected to be more evident with the passage 
of time and thus with repeated measurements 
(cf. the earlier discussion on the factor of 
time). Controlled studies of psychotherapy 
typically follow an assessment-reassessment 
design, trusting regression effects in control 
and treatment groups to cancel each other. 
Moreover, the control period itself is generally 
rather short, a matter of a few months. Thus 
any apparent control group gains, inferred 
simply from a comparison of their initial and 
second measurements, may reflect merely re- 
gression effects. 

But are there even any apparent control 
group gains? Bergin (1967) appears to be- 
lieve that the reason for the negative results 
of many psychotherapy studies is that the 
control subjects appear improved on reevalu- 
ation. A close look at the evidence indicates 
that even this is not the case; there are gen- 
erally not even any significant regression 
effects. 


Adult Control Groups 


Consider first the principal studies of in- 
dividual psychotherapy with nonpsychotic 
adults. 

Shlien, Mosak, and Dreikurs (1962), who 
have reported the clearest results in the lit- 
erature supporting the effectiveness of psycho- 
therapy, used as an index the correlation be- 
tween self and ideal Q sorts. There was no 
change in their controls, who were applicants 
for psychotherapy kept waiting for 3 months. 

Grummon (1954) compared applicants for 
psychotherapy, who were put on a waiting list 
for 60 days, with a group of presumed nor- 
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mals. Changes during this period were com- 
pared on five measures: a self-ideal Q sort 
correlation, a Q-adjustment score, the Wil- 
loughby Emotional Maturity Scale, a Selí- 
Other Attitude Scale, and ratings based on 
the Thematic Apperception Test. Only on the 
Selíf-Other Attitude Scale did the therapy 
applicants change more than the normals, and 
this change was in an unfavorable direction. 
Grummon concluded that “the hypothesis 
that motivation for therapy brings about con- 
structive personality changes as a function of 
time alone [p. 254]? was not supported. The 
“attrition” group, those psychotherapy can- 
didates who declined treatment after the wait- 
ing period, did improve significantly on the 
Q-adjustment score. This finding underscores 
the questions raised earlier about the ade- 
quacy of such a group to provide a remission 
base rate against which to compare those who 
actually enter therapy. 

Barron and Leary (1955), in their study of 
psychotherapy, used a control group of pa- 
tients whose waiting period averaged 7 
months. On the MMPI, the evaluation in- 
strument used, only Pd among the clinical 
scales (L and Es also changed) showed a sig- 
nificant difference (.05 level). These changes 
make no particular sense for a group of neu- 
rotics, for whom the neurotic triad (Hs, D, 
and Hy) and the Pt scale would be expected 
to be the crucial ones. The decrease in the L 
scale and the increase on Es was interpreted 
by the authors to signify an increased amena- 
bility to psychotherapy. The implications of 
the change in Pd were not discussed and are 
puzzling; Pd did not change in the groups 
treated by individual or group therapy (but 
the neurotic scales did). 

An elaborate study in a veterans adminis- 
tration hospital ( Fairweather, Simon, Geb- 
hard, Weingarten, Holland, Sanders, Stone, & 
Reahl, 1960), comparing the effects of several 
different treatment programs on several dif- 
ferent diagnostic groups (nonpsychotic, 
short-term psychotic, and long-term psy- 
chotic), included a nonpsychotic control 
group, who had work assignments and plan- 
ning assistance for posthospital living but no 
special psychotherapy. The average length of 
the control period was 86 days. A variety of 
measures was employed, including the MMPI, 


a Ward Behavior Rating Scale, the Holland 
Vocational Preference Inventory, self-ideal Q 
sort correlations, five special Q sort scales 
(emotional control, conformity, self-accept- 
ance, social adaptability, and self-confidence), 
and the TAT. In general, because analysis of 
variance was employed, the results are re- 
ported as comparisons with the treatment 
groups and do not specify amount of change 
over the control period. The authors specifi- 
cally reported, however, no change for the 
controls on the MMPI generally, a deteriora- 
tion on the Ma scale, and a lower full-time 
employment rate on follow-up. The one posl- 
tive change explicitly reported was the TAT. 
Speaking of the controls generally (short- 
term psychotics and long-term psychotics as 
well as nonpsychotics), the authors stated that 
“when change scores are viewed globally, the 
generic conclusion, that the three treatment 
involving psychotherapy demonstrate m 
cantly more absolute change than the uu 
seems warranted [p. 20]." The study canno 
be considered as support for the occurrence ee 
spontaneous remission among the control p 
tients. " 
In a controlled study of brief psychother" 
apy (Morton, 1955), which reported pcm 
strable treatment effects of a few carefu a 
focused interviews with 20 of 40 selected y" 
dents referred by vocational counselors is 
maladjusted, the control subjects failed a 
improve on a pooled global rating of BUDE 
ment after 90 days. They did improve at 
scores on an Incomplete Sentences t° : 
though not as much as the treated group» 
change which the experimenter attribute sd 
regression artifact. The controls also ana 
their complaints on the Mooney mp 
Check List, but the probability level was s 
reported. The three measures had low 1” ing 
correlations, the only significant one D n 
the pooled global rating and the Probl 
Check List (r = .38). The control group es 
appear to improve on two of these 207 
then, but their meaningfulness is not € io 
and the changes may simply reflect regress 
effects. tacts 
Dymond (1955) reported on 19 sub 
Who were required to wait 2 months p , 
entering psychotherapy. They did not he 
prove on the TAT. A Q-adjustment 5 
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measure showed that the eight “attriters” 
(who declined treatment after the wait pe- 
riod) tended to improve, but not the others. 
The nature of the improvement, however, 
seemed different from that shown by success- 
fully treated clients in Rogerian therapy: 
they manifested more strength to “go it 
alone? rather than more openness and selí- 
understanding. Such a change may reflect 
merely increased defensiveness, which may or 
may not be adaptive or durable. 

Perhaps the only report on a control group 
that clearly offers some evidence for spon- 
taneous remission, because it considers the 
time factor, is that of Cartwright and Vogel 
(1960). The waiting period ranged from 3 to 
24 weeks, averaging 8.8. The Q-adjustment 
scores did not improve, but the TAT scores 
did. A more detailed analysis suggested that 
there was more likely to be a positive TAT 
change when the wait was longer (8-24 
weeks) than shorter (3—7 weeks). The mean- 
ing of this change is not clear, and it is dis- 
cordant with the results of Dymond (1955), 
from the same facility, described above. The 
clients in this study, at a university counsel- 
ing center, may not have been as seriously 
disturbed as patients in other clinics or in 
private practice and may have had less hesi- 
tation in seeking treatment for difficulties that 
subsequently proved to be transient. . 

Rather tangential, because it dealt with a 
minor problem (anxiety over public speak- 
ing), for which the college student subjects 
did not themselves seek treatment, is a re- 
port of treatment and 2-year follow-up. (Paul, 
1967; Paul & Shannon, 1966) comparing de- 
sensitization and insight approaches with an 
"attention placebo" group and untreated con- 
trols. The controls did not appear to improve 
bevond chance expectancy (based on the stan- 
dard error of the measurement 22% improved 
at the .05 level of significance). Half of the at- 
tention placebo group, however (the same se 
the insight therapy group), improved at the 
.05 level. This result may have some relevance 
to the spontaneous remission issue, though oe 
hard to interpret. The attention placebo 
perience could serve as a paradigm for : 
kind of beneficial experiences, apart ped oe 
fessional therapy, alleged to ocour wit! E 
passage of time, giving rise to "spontaneo 
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remission.” Does this phenomenon require the 
presence of a professional therapist, even if it 
does not require his techniques, in order to 
set up expectancies of change? At any rate, 
this report cannot offer any direct information 
on the hypothesized spontaneous remission 
phenomenon because the subjects do not rep- 
resent a patient population for which psycho- 
therapy is ordinarily applied. 

In summary, despite a smattering of in- 
dexes with positive shifts but equivocal mean- 
ing, the studies of individual psychotherapy 
with nonpsychotic adults offer little support 
for the assertion of a general phenomenon of 
spontaneous remission among the control 
group. 


Child Control Groups 


Studies of Psychotherapy with children 
have generally been ignored by reviewers, who 
have felt themselves sufficiently occupied with 
the task of digesting the evidence on adult 
psychotherapy (e.g., Strupp & Bergin, 1969, 
p. 21). The interesting point thus seems to 
have escaped notice that in contrast to the 
spotty results of adult therapy studies, con- 
trolled studies of child psychotherapy have 
been almost uniformly favorable with respect 
to the effects of treatment, Levitt’s reviews 
of child therapy (1957b, 1963) did not 
bring this point out since he omitted consid- 
eration of controlled studies. Two that he did 
mention he did not identify as such: He used 
the control group alone of Lehrman et al. 
(1949) in estimating the spontaneous remis- 
sion rate and averaged a careful study by 
Dorfman (1958) with a heterogeneous col- 
lection of other reports of treatment results. 

The Lehrman et al. (1949) study, which 
was favorable to psychotherapy, is the only 
one that appears to support a remission proc- 
ess among the untreated patients, Since it 
used an attrition group as controls and clini- 
cal judgments of improvement as evaluation 
measures, the results rest on criteria of un- 
known validity and reliability and are vul- 
nerable to the distortions of the interview sit- 
uation discussed earlier. (The study by Lev- 
itt et al. [1959], which compared a defector 
group on 26 variables at follow-up, did not 
deal with gains over time.) 
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TABLE 1 


COMPARISON OF THE ADJUSTMENTS OF CHILDREN 
IN THE TREATMENT GROUP AND CONTROL 
Gnov» AT FOLLOW-UP 


Treatment group Control group 


Adjustment at 


follow-up m | % m | % 
Success 99 50.5 35 | 31.8 
Partial success 46 23.5 42 38.2 
Failure 51 26.0 33 | 30.0 
"Total | 196 100.0 110 | 100.0 


Note,—Reprinted with permission from an article by L. J. 
Lehrman, H. Sirluck, B. J. Black, and S. J. Glick, published in 
"Success and Failure of Treatment of Children the Child 
Guidance Clinics of the Jewish Board of Guardians.” Copyright 
by the Jewish Board of Guardians, 1949. 


The results of Lehrman et al. (1949), in 
fact, seem more compatible with the interpre- 
tation of artifacts among the controls than 
with the spontaneous remission hypothesis. 
These investigators classified their subjects 
into “success,” “partial success,” and “fail- 
ure” categories, The superiority of treatment 
was demonstrated in the “success” category. 
The remission of the untreated subjects was 
evident principally in the "partial success” 
category. 

Dorfman (1958) reported a careful study, 
in which she treated children at a public ele- 
mentary school and equated the untreated 
controls on age, sex, and test scores (Rogers 
Test of Personality Adjustment and a Mean 
Adjustment Rating on a Sentence Comple- 
tion Test). After 23 weeks the controls did 
not improve significantly on either score, al- 
though the treated group improved on both. 


The measures were uncorrelated. The treated 
group was also eva] 


therapy period cor 
vacation. They di 
on the Rogers Tes 
on the Sentence C; 
13-year follow- 


ment period and were 
up. No support for re- 


ratings to evaluate outcome, reported that 
scores of the control group did not improve 
after a 19-month interval. The treated sub- 
jects improved in adjustment on the peer rat- 
ings, and the treated aggressive subjects im- 
proved on teacher ratings of approp 
whereas the untreated aggressive subjects di 
not. These results emerge from a very small 
number of subjects, eight experimental and 
eight control subjects, half of whom were 
classed as aggressive. . 

A study of brief therapy with institutional- 
ized mentally retarded boys (Subotnik & 
Callahan, 1959), in which the eight subjects 
were reevaluated after a 2-month ew 
period, found no gains on three measures O 
anxiety. There were also no demonstrable 
treatment effects. 1 

Cox (1953) reported gains on the TAT anc 
à sociometric measure for a treated group 9 
nine children in an Australian orphanage but 
no gains for a matched control group after 2 
weeks, Not a single child in the control grouP 
achieved a higher adjustment index on the 
TAT or on the sociometric choices made bY 
the other children. These two measures 
selected because they correlated best ape 
composite ratings of adjustment based e 
them and two other measures (a. social a ; 
justment questionnaire and interviews Wit 
the child care workers). 

Two other studies with results favorable Ei 
therapy concerned the effect of treatment © 
reading retardation. In one ( Bills, 1950, 
there were no gains during a 6-week s 
period, though there were during a simile 
treatment period. In the other (Seeman : 
Edwards, 1954), a control group made 3 
apparent reading gains after 4 mont à 
though a treated group did. Two tests of 4! A 
justment were also used but demonstrated r 
advantage for the treated group. It is n 
from inspection of the reported data that t d 
control group did not gain on the Roget 
test. Apparent gains in peer ratings were 7? 
Submitted to a statistical test of significanc” 
since the investigators were interested Ol! A 
in the control experimental group compa" 
Sons, 

Tn these studies of individual psychother" 
AL WR children, like the studies with adult» 
there seems little to sustain the contenti?! 
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that the control groups manifest a remission 
process. 


FLUCTUATION HYPOTHESIS 


A hypothesis, which recommends itself on 
the basis of clinical experience and which 
must be considered in any attempt to evaluate 
the course of psychological disturbances and 
the effects of treatment on them, is that the 
seriousness of the manifestations waxes and 
wanes over the course of time, probably in 
conjunction with the exigencies of external 
stress but possibly as a result of endogenous 
factors (Lesse, 1964; Wilder, 1956). Tt would 
be important not only to determine at what 
point in such fluctuations the treatment in- 
tervenes (Wilder, 1956) in order to interpret 
immediate effects correctly but also to con- 
sider the long-range effects on such patterns. 

Examples of reports confirming the exist- 
ence of stress or periodicity in the course of 
psychological disturbances are readily ob- 
tained. Hoehn-Saric, Frank, Stone, and Imber 
(1969) found that improvement was related 
to the presence of evident stress at the onset 
of the illness. Pollitt (1960) reported that 
more than half the obsessional patients he 
studied had had one or more prior attacks. 
Previous psychiatric illness was recorded for 
67% of male and 45% of female neurotic 
outpatients by Giel et al. (1964). Only 20% 
of child patients followed up 30 years later, 
compared to 52% of the control subjects, 
were found to have been well throughout 
adulthood (Robins, 1966). After a 5-year 
follow-up, only 19% of a group of psycho- 
neurotics had changed the nature of their 
neurotic complaints (Friess & Nelson, 1942). 

It is likely that careful attention to the 
fluctuation phenomenon will delineate impor- 
tant differences in magnitude and etiology 
among the disorders. , 

There are those fluctuations that constitute 
a hallmark of the symptom picture as In 
cyclothymic personality disorders, in which 
depressed or hypomanic episodes or both are 
of limited duration but recur. 

Another kind of fluctuation arises as à 
response to external stress, which plays a pre- 
dominant role in precipitating many psycho- 
logical disorders and quite possibly is capable 
of exacerbating almost all. As external stress 


passes or is resolved and the coping resources 
of the sufferer are less taxed, amelioration or 
remission of the condition may be observed. 
Recurrences may appear in response to some 
subsequent stress. 

Some have marshaled evidence to the effect 
that there are ubiquitous spontaneous rhythms 
as a pervasive phenomenon of nature, which 
will inevitably influence the course of any 
disorder (Chassan, 1957; Lesse, 1964; Wilder, 
1956). The evaluation of any intervention 
must take account of whether it occurs on 
the upswing or downswing (Wilder, 1956). 
Still another source of fluctuations is the un- 
reliability of assessment techniques (Chassan, 
1957). 

In light of these various kinds of fluctua- 
tion it is evident that the usual evaluation 
procedure, which simply compares the status 
of the patient at two points in time, the first 
when he presents himself for treatment and 
the second at some later point, will be fre- 
quently misleading. The apparent remission 
will often be transitory and sometimes spuri- 
ous. 

Repeated assessments over an adequate 
time perspective are necessary to discern the 
pattern of fluctuation. Genuine remission in 
treated or untreated cases would be reflected 
in delivery from these fluctuations or in 
attenuation of their magnitude or frequency. 
In those cases where external stress is salient, 
the pattern of fluctuation will be irregular and 
contingent on circumstance, but some way of 
evaluating vulnerability to stress will be re- 
quired. If psychotherapeutic intervention is 
worthwhile in these cases, it will be mani- 
fested in the increased ability of the patient 
to withstand subsequent external pressures. 

The indications that remission of some 
untreated patients may be distinguishable in 
important respects from remission produced 
by successful psychotherapy (Dymond, 1955) 
ought to enter into any adequate evaluation 
of treatment. Whatever other criteria one 
may wish to apply in judging the value of one 
sort of remission against the other, one would 
surely wish to know whether the patients are 
equally fortified against relapses. 

Tn sum, until the course of psychological 
disorders is studied from a long-term perspec- 
tive with attention to fluctuation phenomena, 
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the evaluation of both “spontaneous remis- 
sion" and the effects of psychotherapeutic 
intervention will continue to be vulnerable to 
these confounding factors. 


CONCLUSIONS 


1. No general phenomenon of “spontane- 
ous remission" has been established. Existing 
reports suffer from contaminating artifacts 
and unvalidated clinical judgments. There is 
no evidence that improvement is a function 
of time, as the hypothesis requires. 

2. Treated patients cannot be logically com- 
pared against a presumed remission rate de- 
rived from untreated patients, except in the 
context of a controlled study. Too many se- 
lective factors in patient characteristics and 
evaluation procedures distinguish treated and 
untreated groups in uncontrolled settings. 

3. “Push-out” patients, placed on a waiting 
list but never treated, the most commonly 
used comparison groups, may differ from 
treated patients on many factors, among 
which may be the one at issue; that is, they 
fail to follow through on treatment because 
they believe they are improving. 

4. Any overall remission figure is an arti- 
fact. Remission rates reported for untreated 
patients have varied from 37% to 78%. The 
rates proposed by Eysenck and Levitt are 
each based on averaging two studies widely 
divergent in context. 

5. Any adequate analysis of the course of 
treated and untreated psychological diffi- 
culties must take account of the fluctuation 
hypothesis, that is, cyclical manifestations of 
severity arising from exogenous or endogenous 
factors. 
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EARLY INFANTILE AUTISM: 
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TO WARD 


LUCIANO L'ABATE 1 


Georgia State University 


The conclusions of A. J. Ward in 1970 concerning the environmental etiology 
of early infantile autism are questioned on the basis of an incomplete and 


selected coverage of the literature. 


More recent evidence and different view- 


points suggest a different, or at least transactional, etiology. Different treatment 
approaches, also not covered in Ward's review, may lead toward a more 
clear-cut differentiation of this syndrome. 


Ward's (1970) overview of early infantile 
autism leaves a great deal to be desired from 
the viewpoint of coverage of the relevant lit- 
erature. Many pertinent and well-known 
studies were ignored that could have added to 
an increased understanding of this specific 
disability. Furthermore, the restricted cover- 
age of the literature biased Ward's conclusion 
toward an environmental emphasis that ap- 
pears prematurely unwarranted by the pub- 
lished evidence. Perhaps some of the evidence 
may have been published after Ward finished 
his review (March 13, 1969). On the other 
hand, there are at least nine references not 
cited by Ward that appeared during and be- 
fore completion of his review (Gittelman & 
Birch, 1967; Graffagnino, Boelhouwer, & 
Reznikoff, 1968; Hermelin & O'Connor, 1968; 
Hewett, 1965; Lotter, 1966, 1967; Metz, 
1965; Ruttenberg, Bertram, Dratman, Frak- 
noi, & Wenar, 1966; Rutter, 1968; Wolf ?. 
Consequently, on the basis of additional evi- 
dence, Ward's final conclusion that early in- 
fantile autism is a syndrome resulting from 
“lack of a varying, novel, patterned stimula- 
tion in the child's developmental history 
[Ward, 1970, p. 361]? must be brought into 
question. Most of the evidence published be- 
fore and immediately after completion of his 
review supports transactional interpretations 
of this syndrome (L'Abate, 1969a). Further- 
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? E, G. Wolf, Autistic children: A study of their re- 
Sponses to auditory stimuli. Unpublished doctoral 
dissertation, School of Education, Boston University, 
1968. 


more, Ward cited no direct evidence that 
supports his conclusion. 

First, we should recognize that one of the 
most difficult tasks of differential diagnosis is 
to demonstrate quantitative differences among 
early infantile autism, childhood schizophre- 
nia, brain damage, and severe mental re- 
tardation. Evidence for such differentiation is 
only now beginning to accumulate. Davis 
(1970), for instance, was able to achieve such 
a discrimination at a purely descriptive, 
qualitative level. Wing (1969), on the other 
hand, using a questionnaire for the parents, 
differentiated autism from other sensory motor 
handicaps like receptive and expressive apha- 
sias, partial blindness, and deafness. She was 
unable to find that autism was anything more 
than the result of multiple sensory handicaps. 

Second, the evidence to support the hypoth- 
eses of maternal deprivation or lack of stimu- 
lation is hard to come by. DesLauriers and 
Carlson (1969) were unable to find any pa- 
rental idiosyncrasies in their detailed and pro- 
longed study of autistic children. In not a 
single instance of all the studies cited in 
Ward's note is there a single piece of evidence 
to support his conclusion. In fact, most of 
the indirect evidence marshaled in Bell's 
(1968) review and in the longitudinal study 
by Thomas, Chess, and Birch (1968) sup- 
ports bidirectional, transactional effects from 
the child to the mother based on constitu- 
tional and temperamental factors. As Des- 
Lauriers and Carlson (1969) also suggested, 
we need to look at these constitutional char- 
acteristics in the child before making their 


49 


50 LUCIANO L'ABATE 


parents “scapegoats” (Schopler?) of our ig- 
norance and of our biases. Levine and Olson 
(1968) were unable to find any differences 
from the general population in the intellectual 
functioning in the parents of three autistic 
children. 

Third, Rutter's (1968) most complete re- 
view presented some of the various viewpoints 
that Ward overlooked without even consider- 
ing—among others, Cain's (1969) review of 
theories concerning special “isolated” abilities 
in severely psychotic children. 

Fourth, the language used by Ward (body 
ego, protective ego barrier, etc.) suggests a 
bias in his coverage of the literature that 
tends to bypass important evidence support- 
ing a possibly organic or at least transactional 
etiology for early infantile autism (DesLau- 
riers & Carlson, 1969; Gittelman & Birch, 
1967; Graffagnino et al., 1968; Hermelin & 
O'Connor, 1968; Ritvo, Ormitz, Markham, 
Brown, & Mason, 1969). 

Fifth, by his emphasis on inferred, hypo- 
thetical constructs, Ward failed to consider 
the information-processing qualities of autistic 
versus schizophrenic versus brain-damaged 
children. L'Abate (1969b) has hypothesized 
that autistic children, by their emphasis on 
motor output at the expense of what seems 
to be mutism, would show a superiority of 
visual over auditory input. Schizophrenic chil- 
dren, on the other hand, besides the possibility 
that their input-output relations are disor- 
ganized, may show a greater deficit in visual 
rather than auditory input. Brain-damaged 
children are poorer in both auditory and 
visual and all other input channels, depend- 
ing, of course, on the type of brain damage. 
Wolf (see Footnote 2), for instance, has 
Shown, under intensive auditory training 
(sounds and music), how autistic children 
improve in their reception of such stimuli. 

Sixth, Ward failed to consider recent, im- 
portant follow-ups of schizophrenic children 
that have appeared in the last 3 years. Among 
them, Goldfarb’s (1970) could be taken as 
Tepresentative, especially as far as his in- 
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ability, as yet, to clarify between organic 
versus environmental etiology. n 
Seventh, Ward omitted novel therapeutic 
approaches based on (a) sensory deprivation 
or restricted stimulation, (5) imitation and 
observational learning (Metz, 1965), (c) be- 
havior-shaping procedures, (d) home manage- 
ment, (e) contingencies and consequences, and 
(f) sensory motor retraining. -— 
These are just a few of the inadequacies 0 
Ward’s (1970) article. It leaves much to be 
desired theoretically, etiologically, diagnosti- 
cally, therapeutically, and rehabilitatively. It 
is unfortunate that a review with so many 
deficiencies in scholarly coverage could lead 
toward premature, speculative inferences Un- 
warranted by evidence, adding to the existing 
confusion rather than detracting from it. 
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A number of points of disagreement are raised with A. J. Ward's (1970) 
article on early iníantile autism. In particular, objections are raised to the 
looseness of use of the term "autism," the presentation of Leo Kanner's criteria, 
and the assumption of psychogenic causation. Recent studies of the biology 


of autism are briefly cited. 


I am in basic disagreement with most of 
what Ward has presented in his article titled 
"Early Infantile Autism" (Ward, 1970). 
Rather than attempt a detailed critique, T 


limit myself to a few of the more salient 
points. 

1. While Ward ostensibly deals with "early 
infantile autism," his treatment of the litera- 
ture makes it clear that he is not using the 
term rigorously. As I have repeatedly em- 
phasized in my book, Infantile Autism (Rim- 
land, 1964) and elsewhere (e.g, Rimland, 
1968), most authors use the term “autism” 
so loosely that one cannot simply assume it 
is being used appropriately. In a recently pub- 
lished study of psychotic children seen by two 
or more diagnosticians, I found only 49 out of 
229 children called “autistic” by the first 
diagnostician to be so diagnosed by the second 
one. The others were called “schizophrenic,” 
“emotionally disturbed,” “retarded,” etc. My 
own research confirms Leo Kanner’s estimate 
that only about 10% of the children loosely 
called autistic are in fact cases of early infan- 
tile autism (Rimland, 1971), 


2. Ward's reinterpretation of Kanner’s 
major criteria for infantile autism is inac- 
curate and misleading. Kanner's criteria (see 
Kanner's writings or Rimland, 1964) were be- 
haviorally stated. Ward's use of such con- 
cepts as “lack of object relations” should not 
have been presented as though they were 
Kanner’s, Further, Ward has misinterpreted 
Kanner's second criterion. Ward incorrectly 
believes Stereotypic motor behavior is a cen- 
tral aspect of the diagnosis, whereas Kanner 
clearly referred to the child's “obsessive in. 
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sistence on sameness” in the environment (e£ 
furniture, books, or toys were not to be 
moved). satus 

3. Ward’s writing style tends to be ase sane 
ing, although I do not believe it is Med 
For instance, Ward listed six points after a 
sentence, “Rimland (1964) pointed out t 
following differences from schizophrenia E 
351]." Most readers would construe this "e 
mean I had listed only those six dilferene 
(between autism and schizophrenia), where? 
in fact I had enumerated 15 differences. 

4. Ward emphasized psychogenic etiolo ce 
although any careful review of the eviden is 
would make it clear that such a gessi 
hardly tenable. My own review of the se 
dence (Rimland, 1964, Chapter 3; see pe 
Rimland, 1969) shows the psychogenio ire 
pothesis to have little or no substance. 
ther, although Kanner has been mm 
have been quite correct in observing Ce reni 
cases of autism tend to have highly — ^ 
parents (Rimland, 1964, 1968; Treffert, 1 has 
Wing, O'Connor, & Lotter, 1967), Kanne ie 
been emphatic in denying that he subs irass 
to the psychogenic hypothesis: In his ag ‘ona 
to the First Annual Meeting of the Nat! 9, 
Society for Autistic Children in June ui 
Kanner said, *(H)erewith I especially ara d 
you people as parents. I have been misqU tion 
many times. From the very first publica n0 
to the last I spoke of this condition 1” 
uncertain terms as ‘innate,’ ” 2 sy" 

Although it has long been clear that Puy 
chogenesis was not the answer, only n f 
has the nature of the biological ptt 
autism begun to emerge, Boullin, Cole! 
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rue 
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and O'Brien (1970) have recently reported 
a biochemical defect in the blood platelets of 
children having classical infantile autism, as 
diagnosed by means of my E-2 Diagnostic 
Check List. A follow-up study (Bouillin, Cole- 
man, O’Brien, & Rimland, 1971) showed this 
defect not to occur in nonautistic psychotic 
children. Further, quite in line with the fore- 
going studies, my own research (Rimland, in 
press) has shown most truly autistic children 
to respond very well to a treatment regime 
consisting of very high dosage levels of cer- 
tain vitamins. 
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EXPERIMENTAL DESIGNS IN WHICH 


BACH 


SUBJECT IS USED REPEATEDLY 


N. KRISHNAN NAMBOODIRI! 
Department of Sociology, University of North Carolina 


In many a field of research, the investigator makes use of experiments in which B» 
same experimental unit (subject) is used repeatedly. One of the main problems such 
experiments pose for the researcher is that the observations may not be all MERO 
ally independent. In the recent literature, several particular forms of statistica 
dependence between successive observations on the same subject have been conr 
ceptualized, and the appropriate designs and modes of analysis have been arrivec 
at. This article reviews the recent writings on the subject. 


In such diverse fields as agriculture, animal 
husbandry, education, psychology, market re- 
search, and “social engineering,” researchers 
often use experiments designed in such a way 
that each experimental unit (subject) is used 
repeatedly by exposing the unit to a sequence 
of different or identical treatments. Such de- 
signs are known by different names in the 
literature: Crossover or changeover designs, 
(multiple) “time-series” designs, designs in- 
volving repeated measurements on the same 
subject, etc. The objective of the present study 
is to review the recent works on such designs. 

The following is a simple xample of the kind 
of designs this study is concerned with: In an 
effort to find out whether the race of the test 
administrator affects the performance of the 
subject in intelligence tests, each of a group of 
subjects of a given race was administered tests 
in succession by persons of different races, all 
persons using equivalent, if not identical, tests, 
and each test being separated far enough in 
time from the others so as to ensure that the 
results of one test did not affect those of the 
Succeeding tests. Winer's book (1962, pp. 
538-577) contains several examples. 

The main attraction of Crossover designs is 
that. by properly choosing the design it is 
possible to make the precision of the estimate 
of the treatment differences dependent on 
the within-subject variance only, and since the 
within-subject variance, under certain condi- 
tons, will be much less than the "between- 
subject” variance, the aforementioned property 
of crossover designs makes for greater precision 
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for a given cost, compared with designs 1 
which each subject is used only once. = 

There are, however, certain special prob ma 
that crossover designs raise. These have Mos er 
with the possibility that the experimental xe 
components of successive observations B m 
same subject may not be statistically NI 
dependent. Nonindependence of error g^ 
ponents may arise in one or the other © 
following forms: 


1. The effect of a treatment may ai 
into periods following the ones in whic dh 
treatment has been applied. Sheche and ates 
(1961) described an experiment that hae E 
the above situation. A study was planntt uth 
which five drugs were to be compared ae 
crossover design. Each patient was to jit 
as his own control. The control fa dmm x 
consisted of giving the patient a pe (he 
drug. A decision had to be made abou 10 
time interval at which the administra e 
one drug was to follow that of another. 1 
time interval between the administratio 
two drugs was to be as long as a week, Acne 
feared that dropouts, and the incomplete” j 
of the sequences of treatments thereby 1 was 
ing, would create serious problems. 4 orte" 
therefore desirable to use periods of um 
duration, such as 3 days. There was no '" ihe 
to suspect that by shortening the perio id pe 
chemical effects of one treatment Logie 
influenced by that of the treatment "each 
previously, since the chemical effect 9 see! 
drug lasted only a few hours. But there set yer 
to be some kind of a psychological CES was 
effect at work, When an “effective” drug he 
given after a neutral one was administere™ BU 
Íormer tended to fail to produce any effc 
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seemed as though the patient had lost confi- 
dence in the analgesics when a neutral drug 
was given: time (longer duration) alone 
seemed to be able to restore the patient's 
confidence. 

2. The error components of successive 
observations on the same subject may behave 
as random variables with some definite, though 
not necessarily known, law of autocorrelation. 
For example, Williams (1949, 1950) found that 
in a uniformity trial involving one agricultural 
plot, the following discrete stationary random 
function fitted the yields in successive years: 


E(y) =g, 
Var (yj) 2c, 
Cov (yox) =p (j >i), 
where yy, yo, . . . stand for the observation in 
successive years. ; 
3. The error component of one observation 
may be related in a specific manner to the 
corresponding components of the preceding 
observations. Glass (1968) provided an illus- 
tration of the above situation. Monthly data 
on traffic fatalities in Connecticut before and 
after 1956 (the year in which an unprecedented 
severe crackdown on speeding was instituted 
by the state government) were analyzed by 
using an exponentially weighted moving 
average model that takes into account the 
(caused by the 1956 


nonstationary nature 
reform) of the time series. 

'The particular form taken by the non- 
independence of the error components of 
successive observations on each subject de- 
termines the specific nature of the problems in 
designing the experiment and in extracting full 
information from the data. The different kinds 
of situations that have been recently considered 
in the literature are reviewed below. 


CROSSOVER DESIGNS WITH STATISTICALLY 
INDEPENDENT ERROR COMPONENTS 


Perhaps we should begin with the situation 
in which each observation can be taken to be 
statistically independent of each other. Finney 
(1964), Cochran and Cox (1957), and various 
other textbooks describe designs in which the 
same subject is used more than once but the 
successive observations can be assumed to be 
statistically independent. No new problems 
arise in the design or analysis of such experi- 


ments—the conventional procedures, applic- 
able to situations in which each subject is 
used only once, apply also to situations in 
which each subject is used repeatedly, pro- 
vided that the successive observations can be 
assumed to be statistically independent. 
Depending upon the number of treatments to 
be compared, the number of subjects available, 
and the number of occasions on which each 
subject can be used, etc., the experimenter 
may employ one or thezother of the standard 
designs available in textbooks. Reference may 
be made to Finney (1964, pp. 273-286) for a 
catalog of designs that are likely to be of 
practical use in different circumstances of the 
above kind. 

It may be of interest here to note that an 
extreme form of crossover designs”of the kind 
described above is the one in which the entire 
experiment is planned on one subject. Finney 
and Outhwaite (1956) presented a statistical 
discussion of such experiments. An obvious 
disadvantage of such designs is that if the total 
time during which a given subject can be under 
observation is fixed, the number of treatments 
that can be compared may be severely limited. 


SITUATIONS IN WHICH CARRYOVER 
EFFECTS ARE PRESENT 


If the effect of a treatment applied to a given 
subject in one period is likely to be carried over 
to the successive periods during which other 
treatments are applied to the same subject, 
then, in the analysis of the results, the methods 
applicable to the conventional designs (in 
which each subject is used only once) cannot 
be made use of as such. One method of pre- 
venting the carryover effects of treatments 
applied previously from affecting the effect 
of a treatment applied in a given period is to 
provide a sufficiently long rest period between 
the treatment periods. But, as many re- 
searchers have pointed out (see, e.g., Federer, 
1955, p. 444), it is not always possible or 
desirable to use such a procedure. The experi- 
ment described by Sheehe and Bross (1961), to 
which reference has already been made, 
illustrates the point. The alternative is to 
design the experiment in such a way that the 
required estimates can be obtained, if possible, 
without going through complicated calcu- 
lations. The estimates one is likely to be 
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TABLE 1 


A Crossover DESIGN THAT PERMITS ESTIMATION OF 
DIRECT AND Carryover EFFECTS 


Subject. 
Period 
1 2 3 4 5 6 
1 A B (e: A B C 
2 B G A € A B 
3 jus A B B € A 


interested in may be the direct effects free of 
carryover effects, carryover effects free of direct 
effects, or the sum of the direct and carryover 
effects in a given period. To see what is 
involved in designing an experiment that gives 
estimates of the above effects, let us consider 
the following simple design. (This design is 
discussed in detail in Cochran & Cox, 1957, 
pp. 133-134.) 

Writing la, t», and te for the direct effects of 
Treatments A, B, and C, respectively, and 
Ta, rv, and re for the carryover effects of those 
treatments, and assuming that the carryover 
effect of a treatment will last only during the 
period immediately following the one in which 
the treatment has been applied, we note that 
the observation for Subject 1 in the third 
period (see Table 1) has the total treatment 
effect given by (assuming an additive model) 
le E rp, the first component representing the 
direct effect of C and the other, the carryover 
effect of B. Similarly, the total treatment effect 
on the second observation on Subject 6 is 
fy + re, and so on. As far 
in the first period are conce 
effect will consist of only d 
ing that all Subjects have 
prior to the starting of the experiment. If, in 
the analysis of the data, the above specification 
of the treatment effects of the different obser- 
vations is made use of, it would be possible to 
obtain estimates of direct effects free from the 
influence of carryover effects, and vice versa. 
i The estimation of direct effects will be easier 
if the experiment is designed in such a way 
that each treatment is preceded by all other 
treatments equally often. It can be easily 
verified that the design shown in Table 1 has 
that Property. Such arrangements in which 


each treatment is Preceded by each other 


as the observations 
red, the treatment 
irect effects, assum- 
been treated alike 


treatment equally often are sometimes called 
balanced designs. 

Balancing in Table 1, however, cannot be 
said to be complete, since no treatment 18 
allowed to follow itself. As we see later on, if 
the design is completely balanced, the direct 
effects can be estimated in the same manner 
as though there were no carryover effects. 
Needless to add, the estimation of direct and 
residual (carryover) effects from experiments 
designed as in Table 1 is more complicated 
than it would have been had the balancing 
been complete. Cochran and Cox (1957, pP- 
135-138) illustrated how to analyze data from 
experiments designed as in Table 1. 


Before examining the drawbacks of Im- 
completely balanced designs such as the one 
shown in Table 1, let us sec how to construct 
such designs. A simple and easily remembered 
method to construct such designs when the 
number of treatments is an even number has 
been given by Bradley (1958). The steps 
involved in Bradley’s method are the following + 
Let n, the number of treatments, be equal 
to 2k: 

Step 1. Number the 2k treatments succes 
sively from 1 to 2h. 

Slep 2. Construct a 2k X 2k table in the 
following manner: Assign integers 1 to 2k 
to the 2h cells in the first row by entering 
successive numbers in every other cell from 
left to right, beginning with the frst and 
reversing the direction once the end of the row 
has been reached, but making sure that the 
return starts from the last (27th) cell. Thus, if 
2k = 4, the first row would lock like this: 1; 4 
2, 3. In each column, starting with the number 
already entered in the top cell, proceed down- 
ward entering in each cell the integer Lo 
mediately following the one in the cell gust 
above, except that the integer 2& is to be 
followed by the integer 1. . 

Sheche and Bross (1961) gave anothe! 
procedure, again simple and easy to remember" 
which is applicable, whether or not 2 iS ae 
even number. Their procedure involves the 
following steps: 

Step 1. Number the treatments from 1 t0 m 

Step 2. Start with a cyclic n X n Lat? 
Square, that is, one in which the ith row 15 ^ 
dE TT S RNC — 4, 


DESIGNS USING SUBJECTS REPEATEDLY 57 


Step 3. Interlace each row of the cyclic 
Latin Square with its own mirror image (re- 


TABLE 2 


Exrra-Periop DESIGN Invotvinc THREE TREATMENTS 


verse order sequence). For example, if n = 5, 
the first row of the cyclic Latin square reads 
1, 2, 3, 4, 5. Its mirror image (reverse order 
sequence) is 5, 4, 3, 2, 1, which when interlaced 
with the original sequence gives 1, 5,2,4,3, 3 
d 2:5; 1 

Step 4. Slice the resulting x X 2» arrange- 
ment down the middle, thus yielding two 
nX n arrangements. The columns of each 
n X n arrangement represent the period, the 
rows the subjects, and the numbers within the 
Square the treatments. 

If v is an even number, only one of the 
n X n squares needs to be used to produce 
the balance required (i.e., to ensure that each 
treatment is preceded by each other treatment 
the same number of times). But if x is an odd 
number, both squares must be used in order 
that the required balance is attained. Needless 
to add, the squares obtained in the above 
fashion represent the minimum arrangement ; 
the balanced arrangement may be replicated 
any number of times. The above method thus 
requires, if (4 is an even number, a multiple 
of n subjects, and if ; is an odd number, a 
multiple of 2x subjects for ensuring the type 
of balancing under discussion. 

It is perhaps pertinent to note that as early 
as 1949, Williams had shown that if the 
number of treatments is even, balanced designs 
of the type shown in Table 1 can be obtained 
by choosing a single suitable Latin square 
and that if the number of treatments is odd, 
two Latin squares taken together would pro- 
vide the required balancing. 

Let us now turn to the drawbacks of the 
types of designs we have been just describing. 
As Lucas (1957) and Cochran and Cox (1957, 
pp. 139-140) have pointed out: (a) The 
estimates of the carryover effects yielded by 
the designs of the type shown in Table 1 
would be less precise than the corresponding 
estimates of the direct effects; (b) the variance 
of the sum of the direct and carryover effects of 
any treatment would be greater than the sum 
of the variances of the direct and carryover 
effects on account of the likely positive associa- 
tion between the two kinds of effects. 

A procedure that reduces the impact of the 
above drawbacks has been suggested by Lucas 


Subject 

Period 
1 2 3 4 5 6 
1 A B cC A B C 
2 B € A C A B 
3 cC A B B c A 
4 C A B B C A 


Note.—This design is derived from the basic changeovi 
design shown in Table 1. pev 


(1957); the suggestion is to make use of what 
is called *extra-period" designs that can be 
easily constructed from the basic changeover 
designs of the type shown in Table 1 by 
repeating the treatments used in the last 
period in an extra period. The extra-period 
design derived from the design shown in Table 
1is shown in Table 2. 

As can be easily seen from Table 2, every 
treatment is preceded equally often by every 
treatment including itself. The addition of the 
extra period makes the direct and carryover 
effects orthogonal, thus making it easier to 
estimate them. For actual formulas needed 
in the analysis of extra-period designs, reference 
may be made to either Lucas (1957) or Cochran 
and Cox (1957, p. 140.) As those formulas 
reveal, the extra-period designs are also sub- 
jected to Drawback a of the basic changeover 
designs (see above), although to a consider- 
ably less extent. 

Patterson and Lucas (1959) showed how to 
derive classes of extra-period designs from 
basic changeover designs by stipulating the 
kind of balancing that is of interest. They 
clearly showed that the type of balancing 
required is dictated by the forms of the esti- 
mating equations. In other words, the effects 
to be estimated and the effects assumed to be 
in operation determine the type of balancing 
that should be attained in a design. If carryover 
effects of given treatments do not last beyond 
the first period following the periods of appli- 
cation of the treatments, the balancing prob- 
lem can be handled more easily than when the 
carryover effects last longer. 

To apply the methods suggested by Patter- 
son and Lucas (1959) to construct extra-period 
designs, one should have an appropriate basic 
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changeover design to start with. For an exten- 
sive catalog of changeover designs, reference 
may be made to Patterson and Lucas? In their 
1962 monograph, Patterson and Lucas gave 
basic changeover designs in which the number 
of periods equals the number of treatments, 
as well as designs in which the former is less 
than the latter. They did not, however, discuss 
two-period designs. In clinical trials and 
psychological experiments, two-period change- 
over designs are likely to be very useful. 
Patterson and Lucas briefly mentioned such 
designs and pointed out that in such designs 
the direct and carryover effects cannot be 
estimated separately. But Grizzle (1965) 
showed that under certain conditions the 
direct and carryover effects can be estimated 
separately from two-period changeover de- 
signs. According to Grizzle’s results, if the 
subject effects can be assumed to be random 
variables, the contrasts involving carryover 
effects as well as those involving direct effects 
are estimable, although the difference between 
the period effect is not. Grizzle showed also 
that a two-period changeover design is prefer- 
able to the one in which each subject is used 
only once, if the carryover effects of the 
different treatments are equal and the corre- 
lations between the responses to different 
treatments are positive. If the above conditions 
are not present, the designs in which each 
subject is used only once are preferable to the 
changeover designs. 


CHANGEOVER DESIGNS THAT FURTHER 
SIMPLIFY ESTIMATION OF DIRECT 
EFFECTS 


From the remarks made in the preceding 
section about extra-period designs, it should 
be clear that if such designs exist and are 
practicable, they are to be preferred to the 
corresponding basic changeover designs. Extra- 
period designs are not without their own 
drawbacks, however. Referring to the design 
presented in Table 2, it can be shown that the 
normal equations for estimating the contrasts 
involving the treatment effects of A and B 


"H. D. Patterson & H. L. Lucas. Change-over 
designs. (Tech Bull. No. 147) United States Depart- 
ment of Agriculture: North Carolina Agricultural 
Experimental Station, 1962. 


reduce to 


Ta— Tr = AS (fa = 
+B + Be = Bs — By) 


for direct effects, and 
Ry — Rp = Oe, = PU) 


for first residual effects (i.e., Carryover effects 
in the period immediately following the one of 
application), where 


T, = total of observations on treatment 
K, 

R, = total of observations immediately 
following treatment K. 

B, = total of observations on sth subject, 

i, = estimate of the direct effect of treat- 
ment K, and 

f£, = estimate of the first residual effect of 
treatment K. 


From the above equations, it should be clear 
that estimation of contrasts involving direct 
effects (e.g., fa — fy) is not as simple à matter 
as in the case when carryover effects are 
absent, and the successive observations can 
be assumed to be independent. The question 
therefore arises whether designs could be con- 
structed that further simplify the estimation 
of direct effects. The changeover design shown 
in Table 3, first proposed by Quenouille (1953, 
p. 196), achieves that simplification. i 

The design in Table 3 permits the calculation 
of the sums of squares between subjects, be- 
tween periods, and between direct effects in the 
same way as one would calculate them ! 
carryover effects are absent. (The calculation 
of the sum of squares between carryovel 
effects, adjusted for subject effects, however; 
is not as simple as the calculation of the othe! 
sums of squares. We return to this point later. 

Berenblut (1964) has pointed out that the 
Quenouille design. shown in Table 3 is * 
particular case of a general arrangement DER 
volving v treatments, S subjects, and ^ 
occasions for each subject. To describe Bere" 
blut's general design, let the v treatments 4 
Ay, As, +++, A». Let the symbols a1, 42 ete 
stand for the treatment sequences 

4% thy), (s, Al ty dns ng Ao), tC @ 
in Table 4. Note that the sequence d2 x 
obtained from the sequence d; by cy 
replacing A; by Aim, with the stipulatio 
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TABLE 3 


QUENOUILLE DESIGN 


TABLE 4 


‘TREATMENT SEQUENCES IN BERENBLUT’S 
DesiGN SHOWN IN TABLE 5 


Subject 
Period |- - Identifica- 
tion Sequence 
1 | 2 | 3 4 symbol 
1 A A B B a A, ds As c As 
2 A B A as Ay Ai Ag = A 
3 B A A ds Apa Ay Ai 
4 B A B : 
dy Ap As Ag c Ad 


A1 will be replaced by ++. Similarly, a5 can be 
obtained from a»; a, from az; and so on. If v 
is an odd number, Berenblut's general change- 
over design can be symbolically represented as 
shown in Table 5. 

The arrangement in Table 5 can be easily 
arrived at by following the five steps described 
below: 

Step 1. Write the row numbers (standing for 
the periods) 1 to 2v in column 1. 

Step. 2. Let the second column be marked, 
at the top, 1 to v subjects, the third column, 
(v + 1) to 2v subjects, and so on. 

Step 3. Enter in column 2 the symbols a1, 
d», +++, dy, first in that order and then in the 
reverse order, so that from top to bottom the 
symbols will read as follows: a1, d», * * *, d»; dv; 
sn * * %y Gy 

Step 4. For each alternate row, beginning 
with the second, complete the cyclical per- 
mutation of the symbol a;. Thus if the begin- 
ning symbol is aj, the cyclical permutation in 
that row would be the following: 


QjQjz1, C O55 ürü *** Aj. 


Step 5. In the remaining rows, repeat the 
symbol that appears in column 2. 

If v is even, the arrangement can be obtained 
by following the same steps as the above, 
except that Step 4 should apply to the alternate 
rows beginning with the lirst, and Step 5 to 
the. rest. 

To get the complete design, the symbols 
in Table 5 should be replaced by the corre- 
sponding treatment sequences. 

It should be easy to verify that the Quenouille 
design in Table 3 is a particular case of 
Berenblut’s general arrangement for the case 
v = 2. This can be done by noting that the 
rows in Table 3, after a simple reordering of 
the columns, can be symbolically represented 


AS d103, 201, A242, and a@yd2, where a1 stands for 
the sequence AB and as, for BA. 

As has been pointed out above, in the 
analysis of variance of Berenblut's design, the 
sums of squares due to subjects, periods, and 
direct effects of treatments are obtained in 
the ordinary fashion, as if carryover effects 
were absent and the different observations 
were independent. To get the sum of squares 
due to the first residual (first carryover) effects, 
Berenblut (1967) has suggested the following 
procedure. Let zs, rs, * * *, 7» be the first residual 
effects of the v treatments. From the tables for 
orthogonal polynomials (Fisher & Yates, 
1963) get (v — 1) orthogonal combinations of 
11, roy 7, 7». Let Xlir; be one such combination. 
Let By represent the total of observations for 
subjects whose last treatment is the ith one and 
let R; be defined as the sum of observations 
on treatments immediately following treat- 
ment i. Then the sum of squares associated 
with the combination Xlr; is 


2S(4 — w — DES 


TABLE 5 


BERENBLUT's GENERAL DESIGN FOR v TREATMENTS 
(v = Opp NUMBER) 


j | Subjects Subjects Subjects 
Period | Mikor |v + Tto 2v Ge = 
| to v? 
1 a | a | ay 
2 | a di di 
3 | a3 | a a3 
| | ; 3 
ri | dy dy | dy 
r+ | aœ | a dea 
: | | : | : 
2v a dy dy 
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The total of the sums of squares of the above 
type associated with the (v — 1) different 
orthogonal combinations of 71, 72, +++, 7, gives 
the sum of squares due to the first residual 
effects. 

While Berenblut's design permits estimation 
of contrasts involving direct effects in the 
same simple way as in the case when carryover 
effects are absent, his design does not provide 
for such a simple estimation procedure for 
contrasts involving carryover effects. Lucas' 
design (Table 2) on the other hand permits 
easy estimation of contrasts involving first 
residual effects while making the estimation of 
contrasts involving direct effects somewhat 
complicated. Apart from the above difference 
between the two types of designs, there is the 
fact that Berenblut's design requires more 
subjects and more occasions per subject for a 
given number of treatments than does Lucas' 
design (Table 2); if the number of treatments 
is three, for example, Berenblut's design re- 
quires nine subjects and six occasions per 
subject, whereas Lucas' design requires six 
subjects and four occasions per subject. 

It is also worth noting, before concluding 
the comparison of the two types of designs, 
that both types suffer from a common draw- 
back. In neither of them does the precision of 
estimated contrast of first residual effects equal 
the precision of estimated contrast of direct 
effects. This drawback becomes problematic 
if both the types of contrasts are of equal 
interest to the researcher. If the researcher 
wants to equalize as much as possible the 
variances of estimated contrasts of direct 
effects and of the corresponding contrasts of 
residual effects, he may have to make use of 
an appropriate basic changeover design and be 
reconciled to the complexity that would 
accompany such designs in the data analysis. 
(Reference may be made, in this connection, 
for appropriate designs to Federer & Atkinson, 
1964, and Patterson & Lucas, see Footnote 2.) 


Some OTHER DEVELOPMENTS 


1. So far, our remarks have been confined 
to designs that aim primarily at estimating 
direct and/or first residual effects. It should be 
noted that the procedures reviewed above are 
not strictly appropriate for estimating direct 
and first residual effects if it cannot be assumed 
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that the effect of a treatment would not extend 
over periods beyond the first one following the 
one in which the treatment has been applied. 
For, if the effect of any treatment extends over 
several periods, the estimates obtained in any 
of the ways discussed so far—estimates of 
contrasts involving direct effects or of those 
involving first residual effects—will be affected 
by disturbances due to the effects of treat- 
ments that persist in second, third, etc., 
periods following the one of application of the 
treatments. [t is, however, possible to con- 
struct designs that balance treatments pre- 
ceding as well as those applied two periods 
back. Williams (1949, 1950), for example, 
showed how to construct such designs using à 
complete orthogonal set of Latin squares. 

2. Another development worth mentioning 
here concerns the construction of two-period 
designs that permit the estimation of Treat- 
ment X Period interaction. Balaam (1968) has 
shown that a two-period design for the above 
purpose can be obtained by taking the first 
two periods of an appropriate Berenblut design 
(sce Table 5). Obviously, such a design would 
require 1° subjects if the number of treatments 
is v. It should be emphasized that Balaam’s 
approach is applicable only if carryover effects 
are absent and the error terms can be assumed 
to be independently distributed. (By error 
term we mean here whatever is left in the 
observation after taking out the subject effect, 
the treatment effect, the period effect, and the 
interaction between treatment and period.) 

3. A third development that may be men- 
tioned here concerns the situation in which 
the observations show a marked trend over 
successive periods. When this happens, the 
procedure described in two previous sections 
“Situations in which Carryover Effects are 
and “Changeover Designs that 
Further Simplify Estimation of Direct Effects”) 
are not strictly applicable to estimating the 


Present” 


direct and carryover effects. Patterson (1950) 
has described procedures that may be used in 
such situations. Patterson's procedure involves 
making use of somewhat complex mathe- 
matical models. If carryover effects are not 
present, the problem stemming from the 
presence of a trend over time in the observa" 
tions can be handled much 


without too 
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difficulty. "The next section is devoted to 
methods applicable to such situations. 


SITUATIONS IN WHICH THE ERROR COMPONENT 


3 Let us confine our attention to experiments 
in which only one subject is used for the entire 
experiment. (If there are two or more subjects, 
the procedure applied to one subject may be 
applied to the others, and the results can be 
combined.) Let there be A treatments, and 
let the number of replications be R. The 
following is a simple example (K = 2; R = 8: 


BAA BA BBAABBABAAB, 


Where 44 and B stand for the two treatments. 

Suppose the observations can be initially 
thought of as having two parts, the treatment 
effect and the remainder. Suppose further that 
What constitutes the remainder in the above 
decomposition of each observation can be 
thought of as the sum of two parts, à poly- 
Nomial trend in time and an error term. In 
Practical terms, the assumptions underlying 


the above model may be described as follows: 
effect which is re- 


A. There is an “aging” 
Sponsible for the time trend in the data. 
an be represented as à 


21 i 
^. The aging effect c r ; 
w, quadratic, cubic, 


Polynomial in time (linca 
etc.), 
3. The aging effect is statistically inde- 
Pendent of the treatment effect. 
4. "The variations unaccounted for (i.e., the 
Variance left unaccounted for by the treatment 
piect and the aging effect) are represented 
Y random residuals. 
ti > The random components of a 
ns just described are identically di 
(i.e., irrespec- 


e observa- 
stributed. 


. Whatever be the design used 

“Ve of the order in which the treatments are 
"DDlied to the subject), analysis of the data 
y least squares is valid. But to apply the = 
Squares analysis, the degree of the polynomia 
rPresenting the aging effect must be y cm 
"orehanq. Ordinarily, one may not ral 
p forchang the degree of the A 
«Presenting the aging effect, Under : [* 
ances, the estimation procedure 


erefore. 
5 T. wonder, theretores 
00 complicated. No wonder, 


Omes t 


that researchers have been motivated to look 
for designs that permit estimation of treatment 
effects, eliminating aging effects, in situations 
such as the one described above, without 
having to go through extensive calculations. 
Cox (1952) has given such a design for the 
two-variable case involving aging effect that 
can be represented by a polynomial of degree 
less than or equal to three, there being no 
requirement that the exact degree of the 
polynomial should be known beforehand. 
Cox’s (1952) design permits estimation of 
treatment contrast in the same simple way as 
it would be estimated if aging effect were 
absent. Unfortunately, it is not possible to 
arrive at exactly orthogonal designs such as 
the one given by Cox (1952) in many cases. 
Reference may be made to Cox (1951) fo 

methods to arrive at approximately orthogonal 
designs when exactly orthogonal ones are not 
possible. If the aging effect is linear, however, 
exactly orthogonal designs can be constructed 
ina variet y of situations using what are known 
as magic squares and magic rectangles (Phillips, 
1964, 19682, 1968b). Phillips (1968b) has also 
given some designs in which the aging effect 
represented by a quadratic trend has been 


balanced. 


A *"BEFORE-AFTER" DESIGN Usinc A 
SINGLE SUBJECT 


In the introduction, reference was made to 
social reform experiments. We now view such 
experiments as those involving repeated obser- 
vations on the same subject. The designs of 
reform experiments may be described as 


follows: 
.AAA...4BB... B... 


B stands for the "reform" 
{ for the condition that pre- 
vailed prior to the institution of the reform. 
The question that the analyst may be asked to 
Has the change of treatment from 4 
to B had any impact on the phenomenon 
under study (¢-8-, whether liberalization of 
abortion laws produced any increase in abor- 
tion rate)? The data yielded by reform experi- 
ments such the above can be put in the form 
of a time series consisting of, say, n observa- 
tions (Yo Jt «7 Jn) prior 10 the introduction 
of the reform and m observations (Yngi Yata 


In this design, 
condition and < 


answer is, 
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TABLE 6 


STRUCTURE OF OBSERVATIONS IN THE “BEFORE-AFTER” 
EXPERIMENT USING A SINGLE SUBJECT 


Occasion | Observation Additive components 
1 y gtlata 
$ ya gtlatbeates 
3 y: gtlatbatbetes 
n Jn ga bed: besides 
nl Yn g-lg- bey beso ens 
ndm nim g- Ig be beni Enim 


++) Ynşm) afterwards. In terms of the language 
of experimental design, we may say that the 
first z observations are on Treatment A and the 
last m observations on Treatment B. The 
research question then becomes, Has there been 
a shift in the level of the series as a result of 
the change of treatment? Note that even if 
there has been a shift in the level of the series, 
that shift need not necessarily be attributable 
to the change of treatment. Campbell (1969) 
described a number of possible rival hypotheses, 
any one of which may explain the shift in the 
level of the series in question. It is the responsi- 
bility of the analyst to examine each of the 
possible rival hypotheses. If the analyst be- 
comes convinced that the shift, if any, in the 
level of the series is attributable to the change 
of treatment, and is not due to any other 
plausible reason, then the model described 
below may prove useful in testing whether 
there has been any difference attributable to 
the change of treatment. 

It is perhaps interesting to note that if the 
observations can be assumed to be statistically 
independent of each other, the above hy- 
pothesis can be tested using the traditional / 
Statistic. In most situations, however, it may 
not be logical to assume that the successive 
observations yi, ys, ..., are statistically inde- 
pendent of each other. Described below is one 
method of specifying the statistical dependence 
of successive observations in situations such 
as the one under discussion. 

We assume that an additive model is applic- 
able, Let g represent a component common for 
^ hat fueron Let the second component 

iy : "rens be the corresponding treat- 
- Let the observed value minus the 


sum of g and the corresponding treatment effect 
be called the remainder, for the moment. The 
model described below differs from the ones 
mentioned in the preceding sections essentially 
with respect to the specification of the structure 
of the remainder component. 

Let us consider the remainder component 
of the first observation as the effect of a random 
shock. Another way of viewing it is as the 
effect of the uncontrolled events. We assume 
that once the subject experiences a random 
shock, he would undergo a permanent change. 
In other words, the random shock produces a 
permanent impact on the subject —permanent 
in the sense that it will last throughout the 
entire length of the time series taken for 
analysis. The permanent impact will be as- 
sumed to be a constant multiple of the amount 
of the shock actually received. Thus if e; !$ 
the shock effect in the ith period, be; will be 
taken to be the permanent effect resulting 
therefrom. (In many practical situations In 
which the above model applies, seems to 
assume a value between zero and one. Hence, 
under those circumstances, b may be inter- 
preted as that fraction of the random shock 
which becomes a permanent effect.) Obviously; 
the permanent effect will be carried over to the 
subsequent periods and therefore will form part 
of the observations obtained in later periods. 
We are thus led to the structure in Table 6. 

From the structure shown in Table 6, it 
should be clear that there are four parameters 
to be estimated; they are g, la, fn, and b. The 
main interest of the experimenter is likely to 
be centered in estimating and testing the 
significance of the treatment contrast /4 — Lp 
Insofar as the interest is in drawing inference 
about /4 — tn, there is no loss of generality if 
we set /4 — 0 and replace is by ls — lA 
(=d, say.) With these substitutions, the 
model described above reduces to the one 
suggested by Box and Tiao (1965). Except 
for the shift in level of the series represented 
by the parameter d, the Box and Tiao model 1$ 
exactly the same as the exponentially weighted 
moving average model widely used in economic 
research. The model is strictly applicable only 
if trend and “seasonal” fluctuations are absent- 
In other words, if aging effect is suspected to be 
present, or if some kind of periodic fluctuation 
accounts for some of the variations in the 
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Observations, these variations should be re- 
moved before applying the model. 
Ende enough, the estimation of d is 
ively an easy matter, as Box and Tiao 
ties) have shown, if the value of 6 is known. 
b tok know any easy way of estimating the 
alue of b in specific experimental situations 
(see, however, Glass, 1968). 

I have carried out several computations 
ig artificially created “experimental” data 
with a view to finding out the problems of 
ring the above model to different empirical 
Sa pice The results of that investigation 

eported elsewhere. 


CONCLUDING REMARKS 
has been 


The main purpose of this article 
rimental 


ne the recent works on expe 
gns that use repeated measurements on 
cach subject. Some readers may understand 
JY “repeated measurements” something other 
a what has been meant by that term 1n 
= study. Consider, for example, the following 
experiment in which each cow receives Diet A 
for the first 2 weeks, Diet B for the next 2 
Weeks, Diet C for the third 2 weeks, and so on. 

le main observation to be analyzed would 
e ae milk yield. Obviously, corresponding 
iem treatment (diet) the experiment prov oe 

eral observations (e.g. Milk yield on each 


= i i H Ta 
ay). In this article, however, 1t has been 
one observation 


Bs 
"SSumed that there is only 
of a treat- 


mensPonding to each application Ms 
itt In the above experiment, the be i 
esp corresponding to Diet A may è the 
trage of 2 or 3 days’ milk yields at the en à 
pi the 2-week period during which the cow has 
een kept on that diet. Thus, in this study, 
he do not mean by repeated measurement 
ax, multiple measurements available for each 
“Pplication of the treatments (eg, the 


it has been 
aly milk yields from a cow when 1t Lan a 
Ma 9n Diet A). It is however, ume i» 
o, vat could be done with the mu pi 
each application o 

ot been 

An obvious 


has n 
an be put 
is to 


Gees . 
& Servations available for 
Ps treatment. This question 
ie any attention in this article. 
se : 
\ © to which those obse 
"Wer the above type of circumst 
Check i as 
eck whether the experiment has 


rvations € 
ances 
been con- 


ducted under “controlled” conditions. The 
methodology involved in such investigations 
is not simple enough to be reviewed here. 
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GENERATION OF RANDOM SEQUENCES BY HUMAN SUBJECTS: 
A CRITICAL SURVEY OF LITERATURE 
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The subjective concept of randomness is used in many areas of psychological 


research to explain a v 


results of the experimen 
and synthesis because the investi 
conditions and definitions of m 
future research are made. 


In many different fields of psychological 
Tesearch, the concepts of "subjective chance" 
and “subjective randomness” have been used 
almost exclusively to account for unexpected 
results. Characteristic of subjective chance is 
that it is not equal to mathematical chance; 
Subjects seem to expect dependencies between 
| Successive eyents in spite of the fact that they 
know that the events occur independently of 
ĉach other, Early in this century, psycho- 
Physics became interested in this phenomenon 
Or the fact that successive responses of a sub- 
Ject are mutually dependent. In the psycho- 
Physical setting, the usual procedure is that 
a binary choice is made. Particularly experi- 
enced subjects are well aware that they are 
Supposed to choose the alternatives in a ran- 
dom order, Even so, the subjective chance 
Phenomenon persists. Hence, one possible 23 
Planation of interdependency of responses 15 
na the subject has his own idea of what a 

ndom sequence looks like. 
pta recent research on SU 

ility, probability learning, and 


Avior also revealed that successive response 
dependent 1n ex- 


independent re- 
e sub- 


tioned 


pjective prob- 
gambling be- 
responses 


of : 

De a subject were mutually 
e 
"mental settings where 


d. Once again th 


Spo 
. ODSeS were expecte 
s men 


m n concept of randomness Wa 

an explanation. 

e experiments on telepathy 
àS Used to account for too many c 
'Cions of serial events. Clinical psy 

t to the author, 


hy the concept 
orrect pre- 
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ariety of experimental results. One method to study 
randomness is to have subjects generate random series. Unfortunately, few 
ts that used this method lend themselves to comparison 
gators employed such a variety of experimental 
athematical randomness. Some suggestions for 


have used the subjective concept of chance 
for the diagnosis of neurotics. Finally, ran- 
domization tasks were employed as a sec- 
ondary task in mental load measurements. 
Tune (1964b) presented a review on the 
interdependency of successive responses in 
various fields of psychological research. 

In spite of the wide use of concepts like 
subjective chance or randomness, the question 
of whether such a thing really exists has never 
been settled. There is even less unanimity 
with respect to the nature and degree of dis- 
similarity between objective and subjective 
randomness. This lack of information, and the 
fundamental interest in how people form ex- 
pectations in situations where chance is in- 
volved, induced a fair amount of research dur- 
ing the past 15 years. A score of experi- 
mental methods was designed to discover what 
subjects expect to happen by chance. 

Reichenbach (1949) was the first to claim 
that humans are unable to produce a random 
series of responses, even when instructed and 
duly motivated to do so. Subsequent publica- 
tions generally supported Reichenbach's prop- 
osition, but, with respect to the details, much 
confusion was introduced. Four randomization 
experiments and other relevant publications 
were reviewed by Tune (1964a). As the num- 
ber of publications dealing with the randomi- 
zation experiment has increased to at least 15, 
a new investigation of the status of the art 
seems justified. The present survey is confined 
nts in which subjects were in- 


to experime * 
roduce a random series of events, 


structed to P 


66 


DEFINITION OF THE RANDOMIZATION 
EXPERIMENT 
The 15 experiments discussed in the present 


review are characterized by several require- 
ments: 


1. The subject is explicitly instructed to 
produce a random series of events. Often the 
instruction refers to random processes like 
coin tossing or throwing dice. 

2. The series are long enough to prevent 
complete memorization. 

3. No stimulus or feedback is given to the 
subject during the experiment, except for an 
eventual pacing signal. 

4. The subjects are normal adults. 


COMPARISON OF EXPERIMENTAL PROCEDURES 
AND CONDITIONS 


Experimental evidence on randomization is 
highly contradictory. One reason may be the 
striking divergence of experimental procedures 
used by the various experimenters. Some rele- 
vant factors, contributing to the disagreement 
among experimental results, are presented in 
Table 1. 

The number of alternative choices ranged 
from 2 to 26. It is likely that this difference 
in range is at least one of the reasons for 
the different experimental findings since Bad- 
deley (1966 [1962]*), Rath (1966), and War- 
ren and Morin (1965) found that nonrandom- 
ness increases with the number of alternatives. 

Some authors (Baddeley, 1966; Chapanis, 
1953; Lincoln & Alexander, 1955; Rath, 
1966; Teraoka, 1963) reported that part of 
nonrandomness was caused by a tendency to 
arrange the alternatives in a natural ordering. 
Other experimenters (Mittenecker, 1958; 
Teraoka, 1963; Zwaan, 1964) used alterna- 
tives that had no natural ordering. Hence, 
the nature of the alternatives can be con- 
sidered another factor responsible for the dis- 
agreement among experimental results. 

The number of generated elements per 
series varied from 20 to 2,520, while some ex- 
perimenters used several series in one experi- 
mental condition, Since boredom may be a 


2 : " 
A. D. Baddeley. Some factors influencing the 
generation o 


Í random letter sequences. (Tech. R 
No. 422/62) eas Bea 


Cambridge, England: Applied Psy E 
cay Research Unit, 1962, 0 e Psychol 
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factor that increases nonrandomness (Weiss, 
1964), it is likely that also sequence length 
influenced the results to some extent. 

The experimental situation seldom included 
a visually displayed choice set. When the 
choice set was only defined by instruction, 
subjects first had to activate their internal 
representation of the set and, next, make à 
random selection, In case of small choice sets; 
the difference between an internally or ex- 
ternally represented choice set may be negli- 
gible, but it is at least doubtful whether in 
Baddeley's experiment the 26 letters of the 
alphabet were equally available to the subjects 
during the whole session. It is plausible that 
subjects used one small subset at a time, that 
they tried to make random selections only 
within the subset, and changed subsets occa- 
sionally. In that case, the series should have 
contained many digrams with elements in their 
natural ordering as, indeed, was reported fre- 
quently (see Table 3). Visual display of the 
set of alternatives, as used by Lincoln and 
Alexander (1955), Mittenecker (1958), Ross 
(1955), and Weiss (1964), may be one way 
to overcome this difficulty. 

Still another factor that should be taken 
into account is the mode of production. Only 
Weiss (1964) reported automatic registration 
of responses by means of push buttons. Most 
of the other experimenters had their subjects 
call out or write down the series. These two 
modes of production differ with respect to 
the availability of previous responses: the 
spoken items can only be remembered, re- 
sponses that are written down on a sheet of 
paper remain present until the page is turned. 
Only Wolitzky and Spence (1968) used a? 
apparatus by which all (written) responses 
but one were covered. Since Tune (19642) 
attributed nonrandomness to the limited span 
of short-term memory, the number of pur. 
responses visible for the subject is a varia 
that should not be overlooked. be 

The rate of production was reported tis 
an important factor by Baddeley (oon 
Teraoka (1963), and Warren and v 
(1965). Among the 15 experiments under C^ 
cussion, response rate varied from .25 S i 
seconds per response, whereas product is 
could be paced or unpaced. Although there 5 
no agreement about the effect of an incre? 
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ing rate of production since both increases 
and decreases of nonrandomness have been 
found, this factor evidently complicates the 
randomization experiment, 

Finally, the number of subjects varied from 
2 to 124. Individual differences were some- 
times rather large, which means that results 
based on small numbers of subjects cannot 
always be generalized. 

In general, it can be stated that no two 
experiments of our sample differ only in one 
of the factors mentioned above. Therefore, 
comparisons are questionable, to say the least. 


DEFINITION oF MATHEMATICAL RANDOMNESS 


With respect to the definition of mathe- 
matical or objective randomness, little stan- 
dardization is evident concerning the criterion 
for calling a series random or nonrandom. 
Here a methodological problem arises, as ran- 
domness is easier disproved than proved. For 
disproving randomness it is sufficient to show 
one type of systematic trend in the series, 
whereas for the establishment of real random- 
ness it is required to prove that not a single 
serial regularity of the many possible ones is 
present. An endless repetition of the alphabet, 
for instance, is perfectly random regarding 
single-letter frequencies, but extremely non- 
random with respect to frequencies of pairs. 
A similar difficulty occurs when an experi- 
menter is interested in the increase or decrease 
of randomness: one series can be more random 
than another according to one criterion and, 
at the same time, less random in another re- 
Spect. Recognition of this problem is crucial 
for the interpretation and comparison of ex- 
perimental results. The measures of non- 
randomness most frequently used are pre- 
sented in Table 2. If only frequencies of 
oe taken into account, anal- 

of zero order? In zero. 
order analyses, no dependencies among re- 
Sponses can be established. For first-order 
E Via of digrams (pairs) are 

DD -order analyses frequencies of 
trigrams, etc. The general rule is that analy- 
Ses of order 2, which require a count of (n + 
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first order of redundancy, 
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1)-grams, can yield dependencies between re- 
sponses that are maximally # places apart. 
As shown in Table 2, few experimenters use 
analvses higher than second order. The mathe- 
matical origin of the measures is also rather 
diverse: Witness the third column in Table 2. 

One class of measures bears relation to 
occurrence of runs, which are strings of 
identical responses. The total number of runs, 
used by Bakan (1960) and Zwaan (1964), 
is essentially a first-order measure, since it 
equals the number of digrams with unequal 
elements. The frequency distribution of runs 
with length 7, as used by Ross and Levy 
(1958) and Teraoka (1963), is a measure 
with all orders mixed in a mathematically 
complex way. Distance of repetition. curves 
(Mittenecker, 1953, 1958; Zwaan, 1954) gives 
the frequency distribution of gaps with length 
i between two identical responses, which are 
actually runs of nonoccurrence of that alter- 
native. This again is a measure with all 
orders mixed. 

A second class contains measures from in- 
formation theory, like information per re- 
sponse (Baddeley, see Footnote 2) and rela- 
tive redundancy in the series (Baddeley, 
1966; see Footnote 2; Lincoln & Alexander; 
1955; Mittenecker, 1958; Warren & Morin; 
1965). Measures of this type require very 
long series for higher-order analyses. Baddeley 
(1966) mentioned 4,000 responses for a first- 
order analysis of 26-alternative sequences. 
Hence, in practice, the analysis is limited to 
the third order. 

Finally, for analyses above Order 4, often 
autocorrelation curves are used, which have 
again the disadvantage that estimates of de- 
pendencies are not given separately for each 
order (Chapanis, 1953; Mittenecker, 1958): 
A series with an endless repetition of the 
digram 0-1 will yield an endless autocorrela- 
tion function with values +1, —1, +1, -h 
etc. Yet the simplest description of the de 
pendencies is a first-order alternation model: 
One way to overcome this difficulty is we 
calculate a power spectrum on the basis ° 
the autocorrelation (Póppel, 1967). For the 
computation of a power spectrum with s 
terms, however, at least 72 autocorrelatio™ 
are needed, whereas the computation will 
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TABLE 2 


s Usep nv VARIOUS EXPERIMENTERS 


MEASURES OF NONRANDOMN 


Author(s) and year 


Order of analysis 


Description of the measure for nonrandomness 


Baddeley (1962) 1 
| 0 
| 1 
| 1 
y Baddeley (1966) | 9 
| 1 
Bakan (1960) ! 
2 
0 
Chapanis (1953) 9 
1,2 
i-? 
Lincoln & Alexander (1955) | Ce 
| 0 
| 1 
2 
Mittenecker (1953) mixed 
0 
Mittenecker (1958) i 
| 1-11 
Rath (1960) | 1 
| 1 
| 
2 
| 2 
1 
Ross (1955) i 
0 
Ross & Levy (1958) » 
Teraoka (1963) i 
1 
1—4 
Warren & Morin (1905) 19 
Weiss (1964) 
2 
- l ; 0 
A olitzky & Spence (1968) 1 
^waa i 
vaan (1964) mixed 
ction 


orrelation fun 
], Unfortu- 


attempted 


SUccece 
is cessful only if the autoc 
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9m sequences. — 
m e Seneral, it can be concluded th à 
fu am of nonrandomness are T ; e rires 
no, Ugh for disproving all serial reg 
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| de, 
Cregg " 
ases of nonrandomness- 


| repetition of digrams 
redundancy 
stereotyped responses 
information per response 
redundancy 
| stereotyped and repeated digrams 
| number of runs 
alternation and symmetry in trigrams 
frequency of alternatives 
frequency of alternatives 
| frequency of digrams and trigrams 
| autocorrelation function 
redundancy 
| frequency of alternatives 
spatial distance between two alternatives in the 
digrams 
frequency of trigrams 
distance of repetition 
| frequency of alternatives 
redundancy 
autocorrelation function 
frequency of alternatives 
frequency of digrams corrected for frequency of 


alternatives 
frequency of trigrams corrected for frequency of 
digrams 


frequency of digrams as a function of the distance 
between the two elements in the natural ordering 

number of alternations 7 

frequency of alternatives 

number of alternations 

occurrence of runs 

frequency of alternatives 

conditional probabilities 

frequency of digrams a function of the distance 
between the two elements in the natural ordering 


occurrence of runs 
redundancy 
frequency of 
dependencies 
frequency of trigrams 
frequency of alternativ 
| number of runs 
distance of repetition 


(n)-grams corrected for lower-order 
es 


RESULTS AND THEORIES 


Considering the divergence of experimental 
procedure and method in measurement, it is 
t surprising that results are quite contra- 
dictory. Actually, there is no way df cum- 
bining details of the results of the 15 publica- 
ns discussed into one coherent theory. Some 
r outcomes are presented in Table 3. 
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TABLE 3 


S OF. EXPERIM 


Are sub- Positive 
jects Other systematic deviation 
Author(s) and year good from randomness 
random- 
izers? | rec 
Baddeley (1962, 1966) | no ? unbalanced 1- and 2-gram fre 
quencies, stereotyped di- 
grams { 
Bakan (1960) no avoidance of symmetric re- 
sponse patterns | 
Chapanis (1953) no | unbalanced 1-, 2-, 3-gram fre- | 
quencies, preference to de- 
creasing seri "oidance of 
| increa: 
Lincoln & Alexander (1955) no New. preference to the casy motor 
responses, to alternatives 
| | witha large spatial distance 
| to the previous alternative, 
and to clockwise or counter- 
clockwise ordered sequences 
Mittenecker (1953, 1958) no Neg. ancing of frequencies with- 
| in small samples 
Rath (1966) | no Neg. preference to symbols idja- 
cent in the natural sequence 
Ross (1955) ves Pos. 
Ross & Levy (1958) no Pos. overuse of run length with ex- | 
| and pected frequency of at 
| Naga least 1 
Teraoka (1963) | no Neg. response chaining related to 
the natural order of the al- 
ternatives, dependencies 
over at least 5 places, 
periodicity with period of 
5 responses 
Warren & Morin (1965) no y 
Weiss (1964) no Pos.? — preference for symmetric tri- 
grams 
Wolitzky & Spence (1968) no ? | 
Zwaan (1964) | no Neg. 


? Negative after briefing about expected frequency of runs, 


| 


Ts ON SUBJECTIVE RANDOMIZATION 


Factors increasing 
nonrandomnes- 


increase of rate of production 
and number of alternatives, 
introduction secondary 
task 


naivete of Ss 


giving verbal response instead 


of motor response 


neuroticism 


increase of number of alterna- 
lives 


naivete with respect to the e 
pected frequency of runs 


a natural order of 


presence of à 
se of rate 


alternatives, decr 
of production 


increase of rate of production 
and of number of alterna 
lives 

boredom 


al 


increase of the information 
load of a secondary task 


First, almost all experimenters found sys- 
tematic deviations from randomness. Only 
Ross (1955) claimed that his subjects were 
good randomizers, Second, most experimenters 
yielded negative recency, which means too 
many alternations or too many runs. Some 
authors did not mention the direction of non- 
randomness because their measures could not 
distinguish between negative and positive re- 
cency. Positive recency was reported only for 
first-order dependencies. Weiss’s (1964) data 
seemed to point to second-order positive re- 


cency, providing that his relative frequencie? 
of trigrams were corrected to add up to 1007 
Although Ross’s (1955) experiments ae 
real randomness, some objections can a 
raised. His subjects were requested to game 
symbols (X or ©) on cards. This proced" n 
may have favored repetition (going on d 
the same stamp) over alternation (taking en 
other stamp), for instance, because the a” 
Jects were bored by the experiment, and her y 
took the easygoing way. Thus, the frequent 
observed tendency toward alternation ue 
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have been balanced out by this unintentional 
facilitation of repetition. 

Third, several other systematic deviations 
from randomness were found, such as prefer- 
ence to the natural order of the alternatives 
and preference as well as avoidance of sym- 
Metric patterns, In general, these systematic 
trends are related to the nature of the stimuli. 

Finally, Table 3 presents some factors that 
are supposed to increase nonrandomness, but, 
In view of the difficulties in defining such an 
effect mathematically, these outcomes should 
be evaluated with caution. 

As far as the different theories are con- 
cerned, there is one point of view that attrib- 
utes nonrandomness to the limitations of 
short-term memory, Tune (1964a) argued 
that subjects who can tally frequencies of 
all m-srams may be random up to order 
(t — 1). Baddeley (1966) claimed that the 
Very use of memory was responsible for serial 
dependencies and proposed a theory based on 
a limited capacity for generating information. 
According to this theory, information gener- 
ated per time unit should be constant. The 
increasing rate of production did make the 
Series more nonrandom, but, as shown before, 
the results of this experiment might have been 
Contingent on mental representation of large 
Sets or parts thereof, rather than on random 
Selection. Teraoka (1963) found a decrease 
of Nonrandomness with an increase o! speed, 
Whereas Warren and Morin (1965) found 

Ne Opposite, The latter authors stated, how- 
“ver, that the amount of information pet 
ated per time unit also increased with rate 

5 : m ry proposed 

p, 'oduction. An interesting theor) ? by 
Y Mittenecker (1953) and exten j^ to 
praan (1964) suggests that subjects ‘thin 
“Vance the frequencies of alternatives be: a 
Small samples. Weiss (1965) pa for 
being! hich states Sab e piat 
'espor random and ied ynditions for being 
random. "tre poses ein be discredited 
Since ane i. nue} be- explained, either in 
erms of : - a ttention or in terms of 
decrea ME a ar theories deal with 
sed distraction. Other d sequences 

ith ordered seq : 


c. Thus far, there 15 
another, 


Jor, 
i, "dom, experience w 
: Normal, daily life, et af 
) cr og 
Teason to favor one theory 


since no reliably decisive experiments have 
been published. 


THE RELEVANCE EXPERIMENT 


Some theories mentioned above attribute 
nonrandom behavior in the randomization ex- 
periment to functional factors like memory, 
attention, and boredom. This implies that the 
randomization paradigm involves two factors 
at a time: subjective concept of randomness 
and some functional limitations of serial ran- 
domization. A necessary control experiment 
can be made by presenting sets of random 
and nonrandom series to subjects with the 
instruction to select the *true" random ones. 
If nonrandomness is indeed attributable to a 
subjective concept, subjects should not be 
able to discriminate between random and non- 
random sequences even in this situation. This 
experiment determines the relevance of the 
notion of “subjective randomness” in the ran- 
domization experiment. 

Few authors report such a control experi- 
ment. Baddeley (1966) mentioned, without 
presenting any data, that subjects could select 
the correct series, suggesting that their con- 
cept of randomness was perfectly alright. 
Cook (1967) arrived at the same conclusion. 
He used nonrandom series that were so obvi- 
ously nonrandom that the data cannot be 
taken as decisive. Mittenecker (1953) and 
Zwaan (1964) both reported that subjects 
were unable to make the correct identification. 
The error was in the direction of negative 
recency. Wagenaar (1970b) found that sub- 
jects were generally not able to indicate the 
true random series, the bias being in the 


direction of negative recency. 


RECOMMENDATIONS FOR FUTURE RESEARCH 


The first problem to be solved is the prob- 
lem of measurement. A method is needed for 
measuring higher-order nonrandomness in 
short sequences. A very original approach was 
made by Vitz and Todd (1969), but this 
author feels that their method is not de- 
veloped well enough to allow for comparison 
of series with unequal length or number of 
alternatives. The present author is involved 
in another attempt to define nonrandomness 
in short sequences up to high orders of de- 
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pendency (Wagenaar & Truijens).* Some 
promising results were obtained with this 
method (Wagenaar, 1970a, 1971), but more 
experimentation is needed to establish whether 
the method is powerful enough. 

The second need is to develop the proposed 
theories mathematically to check more thor- 
oughly on the phenomenon that different 
theories predict identical results. 

The third step is to design decisive experi- 
ments that single out all factors responsible 
for nonrandomness. Especially the discrimina- 
tion between subjective concepts and func- 
tional factors, like memory and attention, 
deserves more experimentation. 


CONCLUSION 


Thus far, randomization experiments have 
not led to conclusive results. Further research 
in this field will yield useful information only 
if the experimental conditions are better con- 
trolled, if mathematical randomness is defined 
in a uniform way, and if the problems are so 
stated as to permit more critical experiments. 


1 W, A. Wagenaar & C. L. Truijens. Measurements 
of high-order sequential dependency in short se- 
quences. (Tech. Rep. No. IZF 1970-19) The 
Netherlands: Institute for Perception. 
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Contingent negative variation (CNV) 


brain wave. The basic experimen! 
ofa constant-foreperiod reaction time 
warning stimulus ($1) followed by 
motor response is usually : 
negative shift in the ele 
approximately 20 microvo 
volved the psychological proces 
attention, A two-process theore 
results: Magnitude of CNV is positively 
and nonmonotonically (inverted U) rel 
sociated with other kinds 
functions and slow cerebral potential 
ments, Although CNV is clearly a cer 
seriously distort its measur 

nomenon of the human brain that is 


functions. 


ctroencephalo, 


ses of 


on (CNV) isa 
al brain wave 
he association 


8 os tngent negative variati 
at. ee electric 
Contin, ually depends upon the asse 

ter, Br ) of two successive stimuli (Wal- 
per, Aldridge, McCallum, & Winter, 


19 
dj op : 
). The basic experimental paradigm for 
a constant- 


dey, 
e 
he t of CNV is that of 
Criod reaction time (RT) experiment. 
stimulus (S1) 5 


Warni 
ollow hing or preparatory 
wi d by an “imperative” stim 
the subject makes a m9 
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of electrophysio: 
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ement. It is cO 


is a slow, suríace-negative electrical 


tal paradigm for generating CNV is like that 


task and involves the presentation of a 


an imperative stimulus (Ss), to which a 


ars within the Si-S: interval as a 
gram (EEG) base line that averages 
etations of CNV findings have in- 
conation, motivation, and 

ed to account for CNV 
and monotonically related to attention 
lated to arousal level. CNV is also as- 
logical activity, notably autonomic 
pany voluntary motor move- 
non, eye movements can 
CNV is an electrical phe- 
ttention and arousal 


expectancy, 


bral phenome! 
ncluded that 
lated chiefly to a 


e 
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SF Y i gl 
For example, as shown in Figure 1 
o f 


(MR). 
when a light flash (Si) was followed in 1.5 


seconds by a continuous tone (S3), there ap- 
hin the S,-S» interval a slow nega- 
ase line of the alesizaetis 
). This "negative surge" 


peared wit 
tive shift in the b 
cephalogram (EEG 
has conceptualized the Sı-S2-MR paradi 

velopment in terms of operant Mr 
rst stimulus (Sı) is considered a condi- 
lus (CS); the second stimulus (S2), an 
unconditional stimulus (UCS). CNV is the AS 
tioned response to Sı (the CS), and the evoked po- 
tential (at least a negative component of it) is the 
unconditioned response to Sz (the UCS) (Walter. 
1964c, 1968). Presumably, the S.-S-MR sequence i 
a reinforcing state of affairs because of termination of 


S, by the MR. This situation corresponds to a "type 
ditioned reflex," according to Black 


3 Walter 
of CNV de 
ing. The fi 
tional stimu 


II operant con 
and Walter (1965, p. 35). In this operant situati 
^ for hundreds of trials (Walter, deret 


CNV laste 
However, i 
no voluntary res 
instance, classical 
with clicks (S; or 
or UCS), CNV reac 
about 20 trials and th 


Walter et al; 


na classical conditioning paradigm, where 
ponse Was made to S» (UCS), for 
conditioning of the corneal refiex 
CS) and air puffs to the eye (S. 
hed maximum amplitude. after 
en declined (Walter, 1964a; 
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Fic. 1. Polygraph trace of contingent negative variation (CNV) and vertical 


electro-oculogram (EOG). For Figures 1 


vertex (Cz) and linked mastoid proc 


2 


s or CNV and 3 centimeters above 


and 3, electrode placements are 


and 2 centimeters below the right eye for EOG. For CNV, relative negativity 
at the vertex is depicted in all figures as upward. For EOG, relative negativity 
at the supra-orbital site is shown as upward. 


in the EEG base line is CNV (Walter, 1964c, 
p. 354). The subject’s key press terminated 
the tone, and CNV returned to base line. As 
in Figure 1, CNV has most often been re- 
corded with scalp leads at vertex (Cz in the 
10-20 international system for placement of 
EEG electrodes; cf. Jasper, 1958) and one 
or both (“tied”) mastoid processes. CNV is 
not consistently clear in the raw EEG trace 
of normal adults. Consequently, as in work on 
sensory evoked potentials, the technique of 
averaging has been used to increase CNV 
amplitude relative to background EEG.* Un- 
der normal conditions, a clear averaged CNV 
can be obtained in normal adults with 6-12 
trials (cf. Figure 2). 
_ The report of Walter’s discovery of CNV 
in 1964 has inspired a rapidly burgeoning 
body of research relating this neurophysio- 
* CNV is not readily seen in the raw EEG trace 
of some subjects, since the mean CNV amplitude of 
20 microvolts (uv) is small relative to background 
EEG, which is generally larger in amplitude, for 
Instance, 40 uv. This unfavorable signal (CNV) to 
noise (background EEG) ratio of 20:40 or 1:2 js 
made more favorable for CNV by averaging. The 


technique of averaging takes advantage of the rela- 
tively consistent time course of several CXVs with 
Of most p y Contrast to the random occurrence 
rene he Bround EEG waves. Further discussion 
ite Rede ah in the context of measurement of 
ved *d potentials appears elsewhere (Tecce. 


logical measure to psychological phenomena.* 
The early work of Walter and co-workers 
focused on the concept of expectancy (Walter, 
1965a; Walter et al., 1964). This work was 
replicated and extended by Low and associ- 
ates who preferred conation, or intention to 
act, as an explanation of CNV changes (Low, 
Borda, Frost, & Kellaway, 1966), A third 
group of investigators proposed general mo- 
tivation to explain CNV findings (Irwin, 
Knott, McAdam, & Rebert, 1966; Rebert, 
McAdam, Knott, & Irwin, 1967). Finally, at- 


5 The first report of CNV in 1964 followed Berger's 
first recordings of EEG in man by almost 40 years 
(Brazier, 1961). One reason for the delay in the dis- 
covery of CNV is the need for computer averaging 
to enhance CNV amplitude relative to background 
EEG (Walter, 1966a). These averaging devices be- 
came available only in recent years, There are severa 
other factors that may have retarded discovery of 
CNV. Only recently have reliable amplifiers for D 
recording become conveniently available (Rowland, 
1968; Thompson, 1967). Also, slow EEG shifts like 
CNV are considered anathema in the work of mo? 
clinical electroencephalographers; consequently, they 
use short time constants, which prevent CNV-like 
waves from being recorded clearly. Furthermore 
CNV is optimally recorded with widely separale® 
electrode placements on the scalp, ior instance, . 
fo mastoid; with a conventional bipolar monta 4 
CNV is difficult to see (Walter, 1965a). Finally. the 
S-S-MR paradigm, which best promotes develop? 


XE i j: ou 
ment of CNV, is not normally employed in EE 
work. 


ge 


tention has been suggested as important in 
CNV development (McCallum, 1969; Tecce, 
` Scheff, 1969) and as the primary psycho- 
Ogical correlate of CNV changes (Tecce, 
RA Tecce & Scheff, 1969). The purpose of 
€ present study is twofold: to review brain- 
ehavior research defined by CNV studies and 
to evaluate CNV changes as related to psy- 
chological processes in man. A review of CNV 
Individual differences (including differ- 
pr in age, psychopathology, and patho- 
Physiology) appears elsewhere (Tecce, 1971). 
a ìs review consists of 11 sections: (a) 
1 characteristics of CNV; (b) CNV and 
Mulus factors; (c) CNV and response 
Variables; (d) CNV and organismic processes; 
E] CNV and distraction; (/) CNV and re- 
on time; (g) CNV and other electro- 
Paysiological activity; (A) CNV and indi- 
Mie i (D neurophysiological 
discussion; and (+) 
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Fic. 2. Averaged CNV and vertical EOG based on 
six trials. 


8 trials (Hillyard, 1969a) were sufficient. The 
appearance of CNV can occur even on the 
ñrst trial, provided the subject clearly under- 
stands the S1-S2-MR paradigm.® 

Light flashes, clicks, and tones are the con- 
ventional stimuli used to generate CNVs. 
With these stimuli, a MR to Se is usually 
necessary to develop clear CNVs (e.g., Walter 
et al., 1964). With highly novel stimuli (e.g., 
color patterns), large CNVs were developed 
without the aid of a MR (Gullickson, 1970). 
CNV occurred when either a physical or men- 
tal response Was made to S» (Walter, 1966a), 
e, merely judging the time of oc- 
» (Low, Borda, Frost, & Kella- 
“now” in response to 


ior instanci 
currence of Se (Low, 
way, 1966) or thinking 


S, (Walter, 1968). A 
p maintenance and reliability of CNV 


has not been studied adequately. There is one 

jort that CNV remained similar over months 
ts 1965a) and another that test-retest 
Ciiability over 2-8 days was .80 (m = 34) 


(Cohen, 1969), On the other hand, usas: 
factory correlations ane Ws SUN iN 
CRY recorded twice oO S SAN ENS Yn 


e same sessi se ‘ 
the same session (Straumanis, Shagass, xg 
Overton, 1969), although variations ; ^. Wi 
perimental procedures of the iwo test "m 
tions in this latter study temper any 
conclusions about CNV unreliability, 
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Fic. 3. Two types of CNV shape based on fast (Type A) and slow (Type B) 
rise time. Each CNV and vertical EOG is based on six trials. 


liseconds after S, (Cohen, 1969). As can be 
seen in Figure 3, two shapes of CNV can be 
differentiated on the basis of the rise time of 
the ascending (negative-going) limb: (a) a 
quick rise to peak and (5) a gradual rise to 
peak with a “negative ramp shape" (Cohen, 
Offner, & Palmer, 1967), and have been desig- 
nated Types A and B, respectively (Tecce, 
1971). Type A CNVs have been found when 
subjects are uncertain about the time of oc- 
currence of S», and Type B CNVs occur where 
there is a high level of certainty (McAdam, 
1969b). Rise time of CNV was fastest for 
shortest S,-Ss intervals (McAdam, Knott, & 
Rebert, 1969). 

Descending limb. There is usually an “abrupt 
decline in the CNV about 120 milliseconds 
after the imperative stimuli [Walter et al., 
1964, p. 10]." This return to base line is 
called the resolution of CNV and can be im- 
mediate or slow and inconsistent (Bostem, 
Rousseau, Degossely, & Dongier, 1967). The 
descending limb of CNV can also become posi- 
tive in polarity by overshooting the base line, 
a characteristic called “positive-after-effect” 
(Cohen, Offner, & Blatt, 1965; Cohen & 
Walter, 1966). This positive overshoot of 
CNV is steeper when no MR is made to So 
compared to when a MR occurs (Lombroso, 
1969; Waszak & Obrist, 1969). In these two 
Studies, eye-movement potentials were moni- 
tored. Waszak and Obrist interpreted the find- 
ings of greater positivity in the no-response 
condition as reflecting active inhibitory pro- 
cesses. 

Magnitude. CNV magnitude has been mea- 
sured in three ways. The usual method is 
to measure the maximum negative voltage 


reached by CNV within the Sı-S2 interval 
relative to the isoelectric EEG base line x 
pearing before S, (e.g., Hillyard & cane , 
1967; Tecce & Scheff, 1969). This ket 
max" averages about 20 pv (Cohen, 19 ^ 
Cohen et al., 1967; Jus et al., 1968; lr 
1967; Walter, Cooper, Crow, McCallum, YN : 

ren, Aldridge, Storm van Leeuwen, & bw 
1967), has a range of 10-50 pv (Low, Bor i 
Frost, & Kellaway, 1966), and has a Ge 
deviation of = 4 py (Cohen, 1969; yonn 
1967). For this measure, the establishment 

a pre-S, isoelectric EEG base line 15 0 

accomplished by "eyeballing" the data, 
procedure that is somewhat subjective. 


y 
second, similar method is to measure 
voltage for a fixed pre-Sa epoch (eo 
seconds) relative to a fixed pre-$: epoc? - 


base-line EEG (e.g., 1 second) (McCallum | 
Walter, 1968b). Third, area under the i 
trace has been used as a voltage-time ipe i 
volt-second) hybrid measure of CNV (Lo an 
McSherry, 1968). Whether or not meni 4 
other measures of CNV are relatively 0T tio 
onal or largely redundant is an open ques 


Topography pee 

The scalp distribution of CNV has et 
studied in both anterior-posterior and. n at 
axes. In the anterior-posterior axis, CN vet! 
plitude has been found maximal at walt? 
(Cohen, 1969; Gullickson, 1970; W f 
1967), somewhat smaller in frontal areas n 
smallest in posterior regions. The findi” Lo 
CNV is prominent over the frontal lobe et? 
Borda, Frost, & Kellaway, 1966; Walte" ^ ¢ 
1964) has been attributed to spurio i 
hancement from eye-movement Po” 
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(Vaughan, 1969; Vaughan, Costa, & Ritter, 
1968). In the lateral axis, CNV appears bi- 
laterally symmetrical over the two hemi- 
Spheres (Cant, Pearson, & Bickford, 1966; 
Cohen et al., 1965; Low, Borda, Frost, & 
Kellaway, 1966). 


Scalp and Direct Recordings 


s Recordings of EEG by electrodes implanted 
in the frontal cortex of the human brain 
(Crow, Cooper, & Phillips, 1963) have shown 
that “the electrical field of the CNV involves 
aà very large area in the frontal regions” as 
Well as central parts of the brain, “including 
frontal association, motor, and somatosensory 
“ones [Walter, 1968, pp. 374-375].” This 
distribution of CNV in the frontal cortex is 
Patchy, some areas showing waves as high as 
404v and others “only a trace [Walter, 1967, 
p. 124].» Although involving a small propor- 
tion of neuronal elements (Walter, 1968), this 
electrical activity underlying CNV develop- 
ment was very widespread and well-syn- 
Chronized [Walter, 1965c, p. 5].” The scalp 
apparently acts as a spatial averager of this 
Widespread and well-synchronized activity 
(Walter, 1967; Walter et al, 1964); scalp 

was attenuated by only one-half, com- 
Pared to CNV recorded directly from cortex 
(Walter, 1964a), The morphology of scalp- 
recorded CNV was similar to CNV obtained 
With subcutaneous electrodes (Walter, 1965b) 
and with epidural electrodes (Low, Borda, 
"Tost, & Kellaway, 1966). Scalp CNV most 
"esembled CNV recorded from the superior 
rontal cortex (Walter, 1965c). 


CNV anp SrruULUS FACTORS 


The alteration of stimulus conditions y 
Produced CNV changes for a wide vanes © 
Variables These include stimulus pr rn 
ptensity effects, stimulus content, task dil- 

culty, and interstimulus intervals. 


Su 
mulus Withdrawal i 
x " e- 
Extinction, The basic paradigm for usb 
PPment of CNV involves the presen 


«d S» was sud- 
th S, and Sa. However, when >? the sub- 


jee, > Omitted (without warning to 
zi zen- 
EH, CNV amplitude was reduced Lec see 
2. Complete suppression or extinction (Low 
consecutive trials without Se g 


Borda, Frost, & Kellaway, 1966; Walter, 
1965a; Walter et al., 1964). Restoration of 
Sə led to restoration of CNV in about 12 
trials (Walter, 1965a). 

Equivocaiion. When Sə was randomly 
omitted in about 50% of trials without warn- 
ing to the subject, CNV amplitude was re- 
duced (Low, Borda, Frost, & Kellaway, 1966; 
Walter, 1968; Walter et al., 1964). This re- 
duction in amplitude by equivocal presenta- 
tion of S» was called an “equivocation” ef- 
fect. With 20% (Walter, 1965a) or 25% — 
(Low, Borda, Frost, & Kellaway, 1966) Sə 
omission, there was no equivocation effect 
(that is, no CNV reduction). In one study 
(Hillyard & Galambos, 1967), no change in 
CNV was found when S» was omitted in 50% 
of trials. This inconsistency in results on the 
equivocation effect may be due in part to 
differences in experimental design and subject 
samples. For instance, in the latter study, 
wide individual differences were found in CNV 
development and decline, with some subjects 
showing even a slight increase in CNV am- 
plitude in the equivocation procedure. 

Social effects. In contrast to the gradual ex- 
tinction of CNV by unannounced omission of 
Ss, when the subject was told that Ss would 
not occur, CNV was reduced, either immedi- 
ately (Walter, 1968; Walter et al., 1964) or 
in 6-12 trials (Low, Borda, Frost, & Kella- 
way, 1966). Walter (1965a) suggested that 
a single social instruction indicating that S, 
will not occur is equivalent to 20-50 actual 
experiences of S» omission and that a crucial 
condition for the efficacy of social instruction 
in reducing CNV is an abiding trust in the 
experimenter. Consequently, Walter empha- 
sized the potential value of CNV as a tech. 
nique to study social communication as well 
as personal mental processes. This suscepti- 
bility of CNV to social influence was par- 
ticularly striking in children between 5 and 
15 years of age and in adults during condi- 
tions of hypnosis (Walter, 1965a). 

This early pioneer CNV work on S, omis- 
sion was interpreted by Walter and co-workers 
in terms of expectancy processes (Walter et 
al., 1964). That is, expectancy was defined as 
the “subjective probability” or relative cer- 
tainty that S» will follow S, (Walter et al. 
1967). Consequently, with Sa omission there 
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was presumed to be a decrease in “the prob- 
ability of signal association as estimated by 
the subject [Walter, 1964b, p. 434]." This 
decrease in the relative certainty (expectancy) 
that S» will follow Sı was considered im- 
portant in reducing CNV amplitude. Walter 
called CNV the Expectancy Wave or E wave 
(Walter, 1965a). 


Stimulus Intensity 


Early reports indicated that within wide 
limits, CNV amplitude was not a function of 
the energy. level of S; and S» (Low, Borda, 
Frost, & Kellaway, 1966; Walter, 1964a, 
1964b, 1965c), unlike amplitudes of sensory 
evoked potentials, which are closely related 
to stimulus intensity (Tecce, 1970). For in- 
stance, in an investigation of three levels of 
intensity of tones, clicks, and light flashes 
presented as S», no differences were found in 
CNV amplitudes for intensities or modality 
stimulated (Low, Borda, Frost, & Kellaway, 
1966). However, recent experiments have 
yielded positive findings. For example, in one 
experiment involving light-click trials (Low, 
Coats, Rettig, & McSherry, 1967), as click 
intensity was decreased in small steps from 
trial to trial, from audible to barely audible 
levels, CNV amplitude increased. This find- 
ing was interpreted in terms of CNV being 
related to greater attentiveness and alertness 
involved in the detection of the barely audible 
Clicks. Similar findings were reported in a 
study (Rebert et al., 1967) where two types 
of randomly presented trials consisted of 
either a light flash (S,) appearing on the 
right followed by a moderate tone (65 decibels 
[db.] above ambient noise) (S.) and the sub- 
jects MR, or a light flash on the left fol- 
lowed by a faint tone (difficult to detect) and 
a MR. With anticipation of the faint tone, 
CNV amplitude was significantly larger than 
when the moderate tone was expected. These 
results were interpreted as showing that in- 
creased effort required to detect the barely 
audible tone resulted in heightened motiva- 
tion, which in turn was related to increased 
CNV amplitude. In addition, enhanced at- 
tention to S. most likely accompanied the 
detection of faint tones. In a similar study 
(Connor & Lang, 1969), where both S, and 
Sz were either low-intensity tones (50 db.) or 


high-intensity tones (80 db.), significantly 
lower CNV amplitude was associated with 
low- than with high-intensity stimuli. (Eye- 
movement potentials were recorded.) The ap- 
parent inconsistency between this finding and 
the results of Rebert et al. can be explained 
by the greater attentiveness reported by Con- 
nor and Lang for the high-stimulus-intensity 
situation (where CNV amplitude was ele- 
vated) and the greater attentiveness that 
most likely accompanied the detection of 
faint tones used by Rebert et al. (where CNV 
amplitude was elevated). In addition, the 
finding by Connor and Lang of low-CNV am- 
plitude with low-intensity tones was found 
clearest after unusually long periods of re- 
cording and may represent, in part, a habitua- 
tion effect, since both Sı and Ss were low-in- 
tensity stimuli, and low-intensity stimuli are 
more vulnerable to habituation than are high- 
intensity stimuli (Thompson & Spencer, 
1966). 

In two experiments employing nociceptive 
stimulation as Ss (Irwin et al., 1966; Strau- 
manis et al., 1969), two types of randomly 
presented trials consisted of either an auditory 
stimulus (Si) occurring to one side of the 
subject followed by a high-intensity shock 
(S»), to which a MR was required, or an 
auditory stimulus on the other side followed 
by a weak shock and MR. The results of 
these studies are inconsistent. On the one 
hand, Irwin et al. (1966) reported that CNV 
amplitude was significantly higher when high 
shock could be anticipated compared to the 
weak-shock situation. This high-shock-high- 
CNV relationship was discussed in terms O 
a heightened drive state acquired through 4 
greater “conditioned emotional response” n 
the high-shock situation (Irwin et al., 1966 
p. 542). However, Straumanis et al. (1969) 
could not replicate the high-shock-high-CNY 
finding and reported no difference betwee! 
high- and low-shock trials. It should be note 
that Trwin et al. determined a painful leve 
of shock individuallv for the strong-shoC 
condition, whereas Straumanis et al. appli? 
the same voltage level to all subjects. (The? 
are wide individual differences in responsive 
ness to electric shock.) In addition, Irw” j 
al. employed all male subjects, whereas strat 
manis et al. used half male and half fem? 
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subjects, Finally, Straumanis et al. recorded 
tye-movement potentials, The contaminating 
effects of eye-movement potentials on CNV 
are discussed below. 

The results of work on intensity of Sə sug- 
Sest that amplitude of CNV varies directly 
with the alerting effects of and attentiveness 
to Sa. Further work with some standardiza- 
tion of stimulus procedures and with monitor- 
ing of eye movements would further clarify 
the relationship. 


Stimulus Content 


The amount of content or information has 
been varied for Sẹ within a CNV trial and 
Over successive trials. ! 

Stimulus complexity. The amount of in- 
formation available in Se within a single CNV 
trial defines the complexity of Se. In this 
Sense, Walter (1965a) observed that CNV 
development was dependent on the informa- 
tion content of Ss, for instance, larger CNV 
amplitude associated with more complex 

aven matrices, In a study of stimulus com- 
Plexity (Low, Frost, Maulsby, & McSherry, 
1968; Low & McSherry, 1968), two separate 
but. related questions were investigated: (a) 
S CNV amplitude at a maximum value in the 
"sual test situation, or can CNV be further 
enhanced; and (b) Are CNVs from separate 
Stimulus paradigms additive? In this experi- 
ment, CNV was first developed by a tap on 

€ ankle (S,) followed in 1 second by a 

-Hertz (Hz.) tone pip (S2), to which a 
Pnoperant button press was required. A sec- 
nc Paradigm of light (Si), a 1000-Hz. = 
then 2), and a nonoperant button pr € bir 
mi Employed to generate another ean 

NV, The two experimental age ‘dé 
(ate then presented together. CNV mag ons 
(defined by area under the CNV wave) sles 
“8hificantly larger when the two pro 
Situations were combined than when ‘CNV 
? Adigm was presented alone. pepe large 
tg In the combined condition was "separate 
CN ‘gh to equal the sum of rie es m. there 
is areas. The authors m magnitude 
ür P Physiological ceiling 1n but not in a 
at CNVs are additive. ship. These 
9ne-to-one algebraic ensicir e of S» 
Sug, dings on CNV and comp aa 

88est a positive relationship- 


Stimulus diversity. The amount of informa- 
tion conveyed over several stimulus presenta- 
tions defines diversity of stimulus content. In 
this sense, there is suggestive evidence that in- 
creased diversity of Sə enhances CNV am- 
plitude. That is, when S» was varied among 
four geometric forms (visual presentation of 
either a circle, a square, a triangle, or a cross), 
CNV amplitude was larger than when S» was 
always the same geometric form (triangle) 
from trial to trial (Walter, 1965c). 

These suggestions that increased amounts 
of information in Sə enhance CNV amplitude 
may be viewed in terms of heightened atten- 
tiveness to S» when it is novel or complex. 
These preliminary findings should be repli- 
cated on larger numbers of subjects, with eye 
movements carefully monitored. 

Since language is an important vehicle for 
the expression of thought, the use of verbal 
material to alter stimulus content would seem 
a worthwhile approach to relating CNV and 
cognitive processes. For instance, simple se- 
mantic stimuli like “ready” and “now” have 
been used as S, and Ss, respectively, to pro- 
duce CNV (Walter, 1965c, 1968). 


Task Difficulty 


Several experiments have involved the study 
of CNV and tasks involving stimulus dis- 
crimination and detection and practice ef- 
fects. In a direct evaluation of CNV and task 
difficulty, Delse, Marsh, and Thompson? re- 
quired subjects to make tone (Ss) discrimina- 
tions designated easy and hard by S4. Eye 
movements were recorded. CNV magnitude 
was smaller for difficult discriminations than 
for easy ones. The authors interpreted the 
smaller CNV magnitude as partly due to 
creater distraction in the difficult task, com- 
pared to the easy one. An interesting feature 
of this study was the clear development of 
CNV in the absence of a MR, which was de- 
laved 1.5 seconds after Ss. In a study that 
required a difficult discrimination (detection) 
of the presence or absence of a faint tone pip 


F. Dele. G. Marsh, and L. Thompson. The 
tingent negative variation as a pre-motor po. 
pe ial. Paper presented at the meeting of the Society 
for Paychophysiological Research, Monterey, Cali. 
or Ps y 
fornia, October 1969. 
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(S2), which occurred 50% of the time after a 
light flash (Sı), CNV amplitude was sig- 
nificantly larger when signals were detected 
than when missed (Hillyard, 1969b). (Eye 
movements were recorded.) This finding was 
interpreted as showing a relation of CNV to 
selective attention and as demonstrating that 
CNV can occur independent of preparation for 
motor activity (subjects waited 2 seconds 
after S» before responding verbally). 

Two studies have shown that CNV develop- 
ment is facilitated by learning effects, which 
presumably decreased task difficulty. In one 
experiment (Hillyard & Galambos, 1967), 
some subjects were given 24 light-click (S;- 
Sə) trials of training prior to the usual (S,- 
S»-MR) paradigm while other subjects were 
given lights and clicks in random order prior 
to the S,-S.-MR condition. The trained group 
Showed significantly faster CNV acquisition 
than the untrained group when tested in the 
Si-Ss-MR condition. This finding was in- 
terpreted mainly in terms of expectancy pro- 
cesses being facilitated by S;-S» training. The 
prior exposure of S-S also very likely fa- 
cilitated attending to S». 

In another study indirectly related to task 
difficulty, subjects were required to turn off a 
continuous tone (Sı) when they estimated 
that 1.5 seconds had elapsed (McAdam, 1966, 
1967). After each time estimate, subjects were 
informed that they were “under,” “over,” or 
“on” 1,5 seconds in their guesses. If the 50 
trials are arbitrarily divided into early, middle, 
and late categories, error in time estimation 
(without regard for direction) decreased sig- 
nificantly from early to middle trials, after 
which there was no change. Like time-estima- 
tion performance, CNV amplitude also in- 
creased significantly from early to middle 
trials, but then decreased significantly during 
the late trials to form an inverted-U function. 
These findings were interpreted as evidence 
that expectancy alone is not sufficient to ex- 
plain CNV changes, since as expectancy be- 
came more definite over trials, CNV decreased. 
In this study, expectancy was presumably de- 
fined in terms of increased precision in time 
estimation, rather than the relative certainty 
that S; will follow S, (Walter, 1967). These 
studies Suggest that attention to Sə is im- 
portant in CNV development and that high 


levels of task difficulty impair the attention 
process. 

Interstimulus intervals. The most frequently 
used S,-S» intervals are 1.0 and 1.5 seconds. 
However, CNV has been measured over a 
wide range of interstimulus intervals— from 
.5 to 20 seconds (Walter, 1967; Walter et al., 
1967) and up to 30 seconds (Walter, 1966a). 
When the interval exceeded 10-15 seconds, 
several CNV waves appeared, suggesting an 
attempt by the subject to form several time 
clusters (Walter, 1966a). For intervals of .5 
seconds, CNV was fully developed, whereas 
for .25- and .125-second intervals CNV was 
partially suppressed (Walter, 1964a). In à 
systematic study of three interstimulus inter- 
vals, CNV amplitudes for .8- and 1.6-second 
intervals did not differ, but were significantly 
larger than for a 4.8-second interval (Mc- 
Adam et al., 1969). 

In summary, work on CNV and stimulus 
factors has shown CNV reduction when Sz 
was withdrawn, when S; involved a difficult 
task, and when interstimulus intervals were 
lengthened. Amplitude of CNV was increased 
when Sə was of low intensity or consisted of 
varied content. Much of this work suggests 
that attending to S» is an important process 
involved in CNV development. 


CNV AND RESPONSE VARIABLES 


The development of CNV has been studied 
in relation to the presence or absence of à 
MR to S;, to force of MR, and to the operant 
quality of the MR. 


Response Presence versus Absence 


One of the most reliable findings about 
CNV is that amplitude is significantly elevated 
when a MR is given to Se, compared to when 
no MR is made (Irwin et al., 1966; Jus et al., 
1968; Low, Borda, Frost, & Kellaway, 1966; 
Peters, Knott, Miller, Van Veen, & Cohen; 
1970; Small & Small, 1970; Straumanis et al 
1969; Walter et al., 1964). However, CNY 
occurs in the absence of a MR, particularly 
when Sə is novel (Gullickson, 1970), x 
nociceptive (Irwin et al., 1966), or involve? 
decision making (Delse et al., see Footnote 1 
Donchin, Gerbrandt, & Leifer’). 


the 


8E. Donchin, L. Gerbrandt, and L. Leifer. I5 tor 


contingent negative variation contingent on a M° 
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Force of Response 


Two experiments have demonstrated that 
CNV magnitude is significantly greater when 
a large amount of force or muscular effort is 
Tequired (and anticipated) for MR to Ss, 
Compared to a low-force condition. These re- 
Sults have been interpreted as showing that 
CNV is related to motivational states (Rebert 
et al., 1967), to “intent to perform [Low & 
McSherry, 1968, p. 206],” and to a “complex 
State of physiological activation, mobilization 
°F preparation set [Low et al., 1968, p. 286].” 


Operant Responses 


Walter’s original conception of CNV as 
cing a conditioned response hinged partly on 
the Operant action of the MR to S», for in- 
Stance, termination of a series of clicks (Sz) 
Y à button press (MR) (Walter et al., 1964). 
n this sense, an operant MR is used in most 
NV work. However, in some studies, clear 
"Vs are recorded when Ss is a discrete 
Stimulus, for instance, a .2-second tone pip 
Low & McSherry, 1968), which is unaffected 
Y the MR, In one experiment where the 
Operant quality of the MR was evaluated 
Peters et al. 1970), CNV amplitude was sig- 
nificantly larger when the MR terminated a 
Series of light flashes (S2), compared to a 
“ingle flash situation having no MR effect. 
Similarly, CNV was significantly enhanced 
When a shock could be avoided by a fast MR 
to S, (Cant & Bickford, 1967). In both ex- 
Periments, the results were interpreted as 
Jolivationa] or arousal effects. It is also pos- 
“ible that heightened attentiveness to S» ac- 
“Ompanied Operant responses. NN 
n a somewhat different operant situation 
idman avoidance), subjects were required 
R utton press between the thirteenth and 
‘fteenth seconds of successive 15-second 
pales to avoid a loud noxious buzzer une 
pda, Frost, & Kellaway, 190). a = te 
ME Wave occurred immediately ee E 
TR. The authors emphasized that CNV w i 
Produced without S, and Ss being externa 
Signals and rejected the notion that CNV is 
ainly determined by the statistical associa- 


ASbonse Paper presented at the meeting a 
ton etican EEG Society, Washington, D. C., Sep 
Mber 1970, 


tion of S, and Sə. These CNV-like waves ap- 
pearing before the occurrence of a MR with- 
out S, and S. being observables are most 
likely motor readiness potentials (cf. section 
entitled “CNV and Other Electrophysiological 
Activity”). 

In summary, CNV is closely related to 
properties of the MR to Ss. The findings sug- 
gest that CNV is related to response intention 
and energy mobilization (arousal), as well as 
to attending to Sə. 


CNV AND OrGANIsMIC PROCESSES 


Attempts have been made to change CNV 
amplitude by altering organismic processes 
via verbal instructions or by direct manipula- 
tion of physiological states. 


Instructions to Change Psychological Set 


When subjects were instructed to concen- 
trate hard and respond very quickly to So, 
CNV amplitude was elevated (McCallum & 
Walter, 1968b). This finding was interpreted 
as showing a relation of CNV to “changes in 
the focus of conscious attention [McCallum 
& Walter, 1968b, p. 327]." When subjects 
were given 25¢ for each motor response to 
Sə that was faster than 200 milliseconds, CNV 
amplitude was significantly elevated.? In this 
monetary incentive condition, subjects re- 
ported paying close attention to Ss in order to 
respond quickly. (Eye-movement potentials 
were recorded.) Where fast eye movements 
were required to S» CNV amplitude was 
tripled for the three subjects tested (Hillyard 
& Galambos, 1970). Similarly, CNV ampli- 
tude was larger when subjects were instructed 
to take an "expectant attitude" upon presen- 
tation of S, than when they were told to take 
an “indifferent attitude” (Jus et al., 1968). In 
a study of self-instructed sets, three experi- 
menter-author-subjects increased and de. 
creased CNV amplitude by “thinking” high 
and low CNV, respectively (McAdam, Irwin, 
Rebert, & Knott, 1966). These results were 
interpreted as showing that subjects can “at 
will” exert “conative control” over CNY. Ac- 


9J. J. Tecce and N. M. Scheff. Attention and DC 
potentials (“Contingent Negative Variation”) in "d 
human brain. Paper presented at the meeting of the 
Society for Psychophysiological Research, Washing- 
ton, D. C., October 1968. 


cording to the authors, the modus operandi 
used to achieve this control involved altering 
“vigilance” to Sz, for instance, "imagined 
that the second stimulus was difficult to de- 
tect,” “concentrated on perceiving the second 
stimulus and making a fast response to it 
[McAdam et al., 1966, p. 195]." Conse- 
quently, attending to S» appears to have been 
involved in changing CNV amplitudes. In one 
study, no change in CNV was found when 
subjects were told to respond to S» as fast as 
possible and that their responses would be 
compared with those of other subjects. (Eye- 
movement potentials were recorded.) The 
authors interpreted this finding as “not con- 
sistent with the hypothesis that motivational 
state is a primary determinant of the CNV 
[Waszak & Obrist, 1969, p. 118].” 


Physiological Conditions 


Drugs. There has been little study of drug 
effects on CNV amplitude. Informal observa- 
tions based on two subjects indicated that 
CNV was reduced following 36 hours of 
abstention from the normal intake of caffeine 
in tea and coffee, and that CNV was restored 
in amplitude following ingestion of caffeine 
sodium citrate (Walter, 1964a). Walter also 
reported that "similar effects have been ob- 
served with amphetamine +° [Walter, 1964a, 


10 Preliminary findings from experiments with nor- 
mal adults indicate a paradoxical decrease in CNV 
magnitude from 30 to 50 minutes after oral adminis- 
tration of 10 milligrams of dextro-amphetamine. 
Furthermore, this reduction was accompanied by 
feelings of fatigue, sleepiness, and sometimes melan- 
cholia! One to 2 hours postdrug, CNV magnitude in- 
creased, and, in individuals who characteristically 
showed Type B CNVs (slow rise time), this elevation 
was relatively greater for “early CNV” (initial epoch 
of 300-500 milliseconds) compared to “late CNV” 
(final epoch of 300-500 milliseconds). This change in 
CNV morphology from Type B to Type A suggests 
ii possible usefulness of rise time, slope, and latency 

© maximum amplitude in the measurement of CNV. 
Since amphetamines are known to havi ivating 
eiecti er W s N ave activating 

nal n the brainstem reticular formation, these 

preliminary data suggest that “carly CNV” is | 
to arousal that. de: EST 
= a processes, The fact that dextro-ampheta 
ne did not increase Type A CNVs (wh dv 
amplitude is alread larg ins iby ORBIS early 
y large) raises the possibility of 


a physiological “ceiling” T 
1967), at le ceiling” for CNV (Knott & Irwin, 


ast in circumst Y 
rend is x stances where CNV i jx 
say Hy eraot T enue s 
ilder’s Law of Initial Values, which 
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p. 314].” In an individual who had taken 
100 micrograms („g.) of LSD-25, CNV ampli- 
tude was markedly increased. 

Sleep deprivation, Aiter both 1 and 2 
nights of sleep loss, there was a significant 
reduction in CNV amplitude as measured in 
the late half of the Sı—Se interval (4.5 sec- 
onds), but no change as measured in the early 
half. The “late” CNV measure was larger 
after 4 nights of recovery sleep compared to 
a base-line day before sleep loss. This finding 
of different results for “early” and “Jate 
measurements of CNV suggests a heteroge- 
neity of CNV over time. 

In conclusion, amplitude of CNV was em 
hanced when instructions to subjects aP- 
peared to heighten attention to Ss. Further 
work is needed with larger groups of subjects 
to determine effects of centrally acting drugs 
on CNV. \ 


CNV anp DISTRACTION 


Interference with CNV development has 
been demonstrated by the presentation 0 
stimulation extraneous to the usual S1-S2M i 
paradigm. Informal reports indicated tha 
amplitude of CNV was reduced both uw 


" z A ; jons 
has been applied to psychophysiological iunctio? 


(Lacey, 1956; Wilder, 1930). The enhancement zi 
CNV 1-2 hours postdrug was accompanied by d 
spicuous perceptual changes, none of which ocu 
in placebo sessions. For instance, the appearanc y 
eyeblinks and other eye movements during | to 
trials was sharply curtailed, and subjects seemet 

xate their gaze (“amphetamine stare”), ever 
ing rest periods between blocks of trials. In í z 
to this narrowed perceptual focus, there were 5l nd 


A d 
taneous verbal reports that “concentrating ing 


t 7 4 ion" AM 2 were asier 
“paying attention” to Si and S: were € (prodr! Àj 


also reported that time went by fast and son 
underestimated the correct time of day by 
as 14 hours. Finally, during rest periods UP ized 
blocks of CNV trials, subjects showed and verba 3): 
a strong need for motor activity (cf. Footnote NV 
Clearly, the effects of dextro-amphetamine om a by 
are not simple ones and appear to be determines de 
complex neuropsychological functions, which ! 
facilitated attention to Ss and arousal processes: ap^ 
uP. Naitoh, L. C. Johnson, and A i 
Modification of surface negative slow potent? iU 
the human brain after total sleep loss. Pape ie 
sented by title at the meeting of the Societ? ji 


Psychophysiological Research, Monterey, C? in 
October 1969, 
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endogenous distractions, such as daydreaming 
(Rousseau, Bostem, & Dongier, 1968), and 
full bladder (McCallum, 1967; McCallum & 
Walter, 1968b), and by exogenous distrac- 
tions, such as conversation (McCallum & 
Walter, 1068b; Walter, 1964a; Walter et al., 
1967), reading (Walter, 1964a; Walter et al., 
1967), irrelevant “buzzes” occurring between 
Sı and S, (Walter, 1968), classical music 
(McCallum & Walter, 1968b), and television 
Programs (McCallum, 1969; McCallum & 
Walter, 1968b). In one experiment, which 
Utilized the television procedure, CNV was 
reduced by presentation of pictures of stick 
figures within the S,-S, interval (McCallum, 
1969: McCallum & Walter, 1968b). CNV re- 
duction was greater when the interpolated 
stimuli were highly interesting and dynamic 
Pictures (e.g., two stick figures being drawn 
by a hand) than for simple pictures (e.g., an 
arrow), . 

In a more systematic study of distraction, 
CNV amplitude was reduced in both normals 
and anxiety patients when irrelevant tones 
Were presented between S,-Ss (click-flash) 
Pairs (McCallum, 1967, 1969; McCallum & 

Valter, 1968b), This distraction effect per- 
Sisted longer for patients, that is, over 25 
trials partial restoration of CNV occurred for 
Normals but not for patients. (Patients were 
9n drugs at the time of testing.) 

Reduction of CNV amplitude has been 
demonstrated for discrete (phasic) and cacti 
tained distraction conditions. In one experi- 
Ment (Tecce & Scheff, 1969), the control con- 
dition involved a flash-tone-keypress oe 
Üigm. Four distraction conditions involve 
the auditory presentation of four — 
€tters either before or within the Siz m- 
terval (1.5 seconds). For example, in i 
distraction task (“letters within”), the ri 
ters «A » «p» “I,” and “o” were ai 
Within the S,-S» interval in a different ori i 
Or each trial, Attention to the poni 
Stimuli was insured by requiring their pom 
Ollowing the MR to Sz. CNV amplitude Wi 
“nificantly reduced in all distraction pers | 
ions. Eye-movement potentials were 


— 
tored, Speed of response (RT) to Se WE PE 
nificantly lengthened when extraneous (Para- 
Sccurred within the Si-Sz p feeling 
“oxically, some subjects reporte s 


aroused and alerted and having faster RT 
to S5.) The authors interpreted these results 
as demonstrating that CNV amplitude is “a 
sensitive measure of attentional processes in 
humans [Tecce & Scheff, 1969, p. 333].” Two 
lines of evidence suggested that CNV reduc- 
tion during distraction was not due to lowered 
drive or arousal level. First, subjects re- 
ported greater effort, in the distraction condi- 
tions, where attention was divided between 
giving fast responses to Sə and listening to the 
extraneous stimuli, compared to control con- 
ditions, where only a MR to S» was required. 
This heightened effort was pronounced when 
the extraneous stimuli were given within the 
S,-8. interval. Second, heart rate levels were 
significantly elevated in distraction conditions, 

a finding which rules out lowered drive o 

arousal level as an explanation of CNV re- 

duction. Thus, it was concluded that CNV 
reduction was a function of lowered attentive- 
ness to S». In addition to these findings 
based on discrete presentations of extraneous 
stimuli (presented briefly before or within 
the S;-S» interval), sustained distraction 
(adding 7s continuously, either aloud or 
silently) nearly abolished CNV (Tecce & 
Scheff, see Footnote 9). (The suppression of 
CNV is more clearly seen when mental 
arithmetic is silent because of CNV-like shifts 
in the EEG base line caused by speech.) Sus- 
tained distraction by adding 7s also affected 
CNV morphology. Rise time of CNV was 
delayed, and latency to maximum amplitude 
was retarded. This finding suggests a link be- 
tween distraction and the occurrence of Type 
B CNVs, which have a delayed latency to 
maximum amplitude. As in the distraction ex- 
periments on short-term memory for letters and 
digits (Tecce & Scheff, 1969), RT to So was 
significantly slowed by mental arithmetic, and 
heart rate levels were significantly elevated, 
These distraction experiments, then, demon- 
strated a dissociation between attention and 
arousal processes in relation to CNV. That 
is, CNV amplitude was positively related to 
attending to S» and inversely related to auto- 
nomic arousal. In conclusion, distraction 
clearly reduces CNV amplitude and appears 
to be one of the most powerful variables that 
can disrupt CNV development. These findings 
suggest that attention is a primary psycho- 
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logical correlate of CNV development. The 
accompaniment of the "distraction effect" by 
elevated heart rate levels suggests that 
heightened levels of autonomic arousal may 
mediate disruption in normal CNV develop- 
ment (Tecce, 1971). 


CNV anp REACTION TIME 


Extensive work has been reported on the 
relationship between variations in CNV ampli- 
tude and changes in speed of response or re- 
action time (RT) to S». Evaluation of these 
findings is somewhat difficult in view of the 
variety of procedures used to obtain and re- 
port CNV-RT relations, for instance, con- 
trol versus treatment conditions, within- versus 
between-subject analyses, and within versus 
treatment conditions. 


Control Conditions 


Intraindividual relationships. Early observa- 
tions by Walter and co-workers suggested 
that with repeated training trials (S,-S2-MR), 
CNV became more stable and RT to S» be- 
came faster (Walter, 1964b, 1965c, 1966a; 
Walter et al., 1964).12 When CNV trials were 
classified as fast or slow based on RT to Sz, 
amplitude of averaged CNVs was significantly 
larger when RT was fast (Connor & Lang, 
1969; Lacey & Lacey, 1970; Tecce & Scheff, 
1969; Waszak & Obrist, 1969). Eye move- 


12 Since the CNV paradigm is a constant-fore- 
period RT situation, subjects can learn to estimate 
the Sı-S2 interval and to respond when they think S: 
will occur rather than after it actually occurs. There- 
fore, MRs to S; (especially the fast RTs of 50-80 
milliseconds reported by Walter) could be a time 
reflex rather than a true RT. This possibility was first 
raised by Walter (1964a, 1965a, 1965b, 1968), who 
then marshaled the following evidence against it. 
First, if subjects respond by guessing the time of oc- 
currence of Sz, they would underestimate as often as 
overestimate the correct time of occurrence of Sz; 
consequently, there would result a normal distribu. 
tion of RTs around Se. This distribution does not 
actually occur; instead, it is positively skewed with 
a peak for short latencies (Walter, 1968). In addi- 
E presets MRs to S; are usually rare once 

^ as developed. Walter noted that when S. is 
withheld without warning to the subject (as in a 


« ? (ral i 

the trial in RT experiments), rarely are there 

an wee to Se, and there is no EMG discharge 
ime S: would have occurred. The conclusion 


from Walter's work i 
s tha 
and not time reflexes, at MRs to S: are true RTs 
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ments were monitored in the latter group of 
studies, In other work, RT was faster for 
trials where CNV occurred than for non-CNV 
trials (Guibal & Lairy, 1967). Significant 
intraindividual correlations have also been re- 
ported to show the high-CNV-fast-RT. rela- 
tionship (Hillyard, 1969a; Hillyard & 
Galambos, 1967), although these correlations 
were sometimes statistically nonsignificant 
(e.g., Peters et al., 1970). 

Interindividual relationships. The relation- 
ship between CNV and RT has also been 
studied across individuals. For example, 4 
significant between-subjects correlation was 
found for CNV amplitude and speed of re- 
sponse to S» (Hillyard & Galambos, 1967). 
On the other hand, this correlation was not 
found in another study (Waszak & Obrist, 
1969), and no difference in CNV amplitude 
occurred between subjects who have fast RTs 
and those who have slow RTs (Connor & 
Lang, 1969). 

In control conditions, then, CNV amplitude 
is related to speed of response to Sz for intra- 
individual analyses where subjects are used 
as their own statistical controls. Results are 
inconclusive for interindividual analyses. 


Noncontrol Situations 


The relation of CNV to RT has been studied 
where other variables (experimental treat- 
ments) were of central interest. 

Between-treatment comparisons. 
evidence that experimental treatments 
affect CNV amplitude also change RT. 
example, significant decreases in CNV ampli- 
tude were accompanied by significantly longer 
RTs when distraction occurred  (Tecce 
Scheff, 1969); when the S;-S interval was 
lengthened (McAdam et al., 1969); and when 
sleep-deprived subjects made little effort 
stay awake (Naitoh et al., see Footnote 
Similarly, significant increases in CNV am 
plitude and significantly faster RTs occur’ 
when monetary rewards were given for je 
response to S» (Tecce & Scheff, see Footnot 
9) and when S5 was a strong shock, whethet 
avoidable (Cant & Bickford, 1967) oF w 
avoidable (Irwin et al., 1966). On the oth? 
hand, there are reports of dissociations 
tween CNV amplitude and RT to S», 1 
instance, CNV increase and no RT different 


There iS 
which 
For 
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from both increased force of response to Sz 
(Rebert et al., 1967) and sleep deprivation 
(latencies of 1 second or more were excluded 
and subjects tried to stay awake; Naitoh et 
al., see Footnote 11). Further dissociations in- 
clude findings of no CNV change associated 
with slower RTs resulting from partial S» 
omission (Hillyard & Galambos, 1967) and no 
CNV change related to significantly faster 
RTs resulting from instructions to respond 
fast (Waszak & Obrist, 1969). In this latter 
Study, subjects whose RTs were faster in the 
Speed set condition also had significantly 
larger CNVs. 

Within-trcatment comparisons. Significant 
Positive intraindividual correlations were re- 
Ported for CNV amplitude and speed of re- 
sponse to S, within low- and high-muscular 
effort conditions (Rebert et al., 1967) and 
Within conditions reflecting different S:-Sz 
intervals (McAdam et al., 1969). Amplitude 
of CNV was larger for fast than for slow RT 
trials when S, was a weak (anticipated) 
Shock, but not when a strong shock was 
Used? No CNV-RT relation was found with 
50% S, omission (Hillyard & Galambos, 
1967) or with 50% MR omission. (Irwin 
et al, 1966). 


Clinical Populations 
Informal observations have indicated that 
low CNV amplitude and slow RTs are E 
Sociated with psychopathological ee ri 
or example, anxiety neuroses (McCa ah E 
Valter, 1968b), and that Je pé za 
'V. development and speed of d obi 
Sulted from successful lobotomy of a pa ic 
Patient and clinical improvement e 3 A 
tient with hysterical anesthesia below 
Waist (Walter, 1966a). — 
In summary, CNV amplitu 
related to speed of response 
aS relation has not been an 
“stent one, This relationship 
When four conditions are met: 
nS Used as their own controls, 
“dividual comparisons betwee 


ide was positively 
to S5 although 
an entirely Con- 
is clearest 
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that is, intra- 
n CNYs for 
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classes of RTs; (b) classification of RTs in- 
volves no overlap, that is, fast versus slow 
RTs; (c) extraneous sources of variance 
from experimental variables are minimized, 
such as exists in control conditions; (d) eye 
movements are monitored and controlled for. 
This latter criterion was missing in several 
studies where no CNV-RT relationship was 
found. Treatment conditions (e.g. distrac- 
tion), which altered RT to S», usually altered 
CNV amplitude. It is concluded that a sig- 
nificant function of CNV is to facilitate speed 
of response to S». Since RT has a hoary tradi- 
tion of being a measure of attention and 
distraction (Evans, 1916; Morgan, 1916) and 
alertness (Woodworth & Schlosberg, 1954), 
the findings reviewed suggest that CNV is 
related to attentiveness and phasic arousal in 
the experimental task. That other processes 
are also involved in CNV changes is suggested 
by the low order of magnitude of correlations 
between CNV and RT (and the corresponding 
small amount of common variance shared by 
these measures). 


CNV AND OTHER ELECTROPHYSIOLOGICAL 
ACTIVITY 


The interpretation of CNV depends, in 
part, on its relation to other measures of 
electrophysiological activity. In this section, 
evidence is reviewed concerning the relation 
of CNV to background EEG, evoked poten- 
tials, Bercitschaftspotential, motor potentials, 
autonomic and autonomic-related functions, 
and eye-movement potentials. 


Background EEG 


Walter has underscored the importance of 
studying the interaction between intrinsic 
rhythms (background EEG) and evoked re- 
sponses as “an indispensable part of the 
cerebral information-flow system [Walter, 
1964c, p. 320].” Work on CNV and back- 
ground EEG has primarily involved informal 
reports of alpha and slow wave activity. 

Alpha activit y. Lindsley (1969) has recently 
pointed out that before reliable DC amplifiers 
were available, early work suggested possible 
relations between alpha and slow DC shifts 
(Bishop, 1936; Jasper, 1936). In one in- 
dividual, CNV was markedly Increased after 
administration of LSD-25; during this post- 
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drug period alpha rhythms were “first ac- 
celerated and then entirely suppressed 
[Walter, 1964a, p. 314]." In another in- 
dividual, a persistent train of alpha waves 
appeared on the CNV wave, which had been 
reduced by distraction resulting from classical 
music (McCallum & Walter, 1968b). Thus, 
in two subjects, CNV amplitude was inversely 
related to alpha activity. In a comparison 
of “incidences of alpha activity” occurring 
during the S,-Ss interval (where CNV oc- 
curred) and during the intertrial interval 
(where no CNV occurred), no convincing dif- 
ference in alpha activity was found (McAdam, 
1969a). Since RT is related to both CNV 
amplitude and alpha activity (Morrell, 1966), 
the latter two measures probably share some 
common variance and deserve further study. 
Slow EEG activity. Recently, Walter 
pointed out that CNV is largely imperceptible 
in EEG records having low frequency activity 
(e.g., delta), and that perhaps common cort- 
ical mechanisms underlie CNV development 
and “diminished functional competence as- 
sociated with delta activity in sleep or organic 
disturbance [Walter, 1968, p. 376].” In a 
study of one petit mal epileptic, CNV was 
recorded during photically induced spike and 
wave EEG; motor responses to S» were un- 
impaired (Winter, 1967). In a study of pa- 
tients having focal epileptic discharges of 
temporal lobe origin, 3 of 11 individuals 
Sometimes showed temporary suppression and 
disorganization of CNVs particularly in the 
hemisphere not showing the epileptiform dis- 
charges (Zappoli, Papini, & Cabras, 1969). 
One of the most important problems in 
CNV research is to elucidate the relation of 
eel to background EEG activity. Unfortu- 
aoe be concluded from the few 
based on Rut s vg = ae oa ir 
Dos bein samp es of subjects is needed. 
Bon ai the due LM determina- 
background EEG o ST and 
eee occurring synchronized with 
a during the S;-S, interval. Furthermore, this 
relationship should be determined separately 
for early and late segments of CNV and for 
14A similar reduct 
creased percent time 


where subjects became 
of testing, 


ion in CNV accompanied in- 
alpha during placebo sessions 
bored and tired over 4 hours 


early and late phases of its development. Of 
particular value in the study of CNV-EEG 
relations would be the use of variables of 
known influence on CNV (e.g., distraction), 
as well as treatments of known influence on 
background EEG (e.g., centrally acting drugs) 
(Fink, 1968). Many averaging devices used 
to record CNV are also capable of automatic 
analysis of EEG frequency and amplitude 
(e.g., Tecce & Mirsky, 1967). 

voked Potentials 

In contrast to intrinsic electrical brain ac- 
tivity (background EEG), both CNV and 
evoked potentials (EPs) are studied by aver- 
aging evoked electrical responses recorded 
from the human brain (Tecce, 1970). Work on 
CNV and EPs is reviewed separately for EPs 
to S, and S» as presented in the usual CNV 
paradigm and for EPs to stimulation extra- 
neous to S, and Ss. 

Evoked potentials to S,. Work on the rela- 
tionship between amplitudes of CNV and 
EPs to S, has provided inconsistent findings. 
Evidence of a positive relationship has been 
reported during CNV acquisition (Walter, 
1964c), speed sets for response to Sy (Waszak 
& Obrist, 1969), distraction (McCallum & 
Walter, 1968b), and boredom and drowsiness 
(Cohen & Walter, 1966; Walter, 1965c). In 
addition, Cohen (1969) has reported a sig- 
nificant positive correlation between ampli- 
tudes of CNV and EPs to S;. On the other 
hand, in two systematic studies of CNV and 
EPs to S; no significant differences were found 
between CNV-present and CNY-absent con- 
ditions (Rebert & Knott, 1970; Small & 
Small, 1970). Rebert and Knott reported a 
tendency for most EP components to be more 
negative when CNV occurred, No change 1° 
EPs to S, accompanied increased CNV am- 
plitude resulting from frontal leucotomy in 
a phobic patient (Walter, 1966a) or from 
varying content of Sə from trial to trial 
(Cohen & Walter, 1966). Finally, no relation- 
ship was observed between spontaneou> 
changes in CNV and the negative component 
(latency of 110-150 milliseconds) of EPs t° 
Sı (Hillyard, 1969a), 

Evoked potentials to Sy. Early informa! 
observations suggested that amplitude of EPS 
to Se were inversely related to amplitude 9 
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CNV measured during acquisition (Walter, 
1964c) or in psychiatric patients. (Walter, 
19642, 19662). In a systematic study of this 
Problem, no significant differences were found 
m EPs to Sz for CNV-absent (no MR) and 
CNV.present (MR) conditions (Small & 
Small, 1970). "There was one report that CNV 
O5Scured measurement of EPs to Sx (Cohen, 
1969) 
There have also been informal reports of a 
‘alld association of CNV and EPs to Sz; 
3» example, both increased in amplitude with 
forsal variation in Sy (Cohen & Walter, 
tiene? and decreased in amplitude. in a pa- 
7 radiotelemetered for the first time 
(Walter et al, 1967). This relationship 1s 
Particularly clear for a late (300-millisecond 
suy) positive component of EPs that is 
‘lated to information processing (Sutton, 
"had Zubin, & John, 1965; Sutton, Tuet- 
& Zubin, & John, 1967). This P300 com- 
Ponent occurs at approximately the same time 
n the positive component of the terminating 
` descending (positive-going) limb of CNV. 
P wien experiments have demonstrated m 
n attention. (instructions to focus m 
as Ton on a stimulus) elevated P300 as E^ 
sti CNV-like waves occurring before the 
"mulus (Donchin & Cohen, 1967; Donchin 
Smith, 1970). (Eye movements were moni- 
inte, Since Donchin and Cohen (1967) 
tio Preted P300 as reflecting selective ^ i" 
N, their work also suggests that CNV may 
© related to attention, Similarly, amplitudes 
both CNV and P300 were significantly 
Vated when Ss» was detected, compat a 
à en it was missed (Hillyard, 1969b). his 
Wing was i dicating that 
8 was interpreted as in " E mw 
indi Is related to attending e^ ym s 
Posi on CNV and EPs to 92 5 Em 
Ive relationship, particularly for À s 
“voked potentials to cxtrancous -— 
dig Studies have been carried out to mti 
e relationship of CNV and EPs to stimull 
m : ¿periment of 
p, tous to S, and S». In one exp as 
Stn, and Ellingson,’ Si was the onse 
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Z. tone and S. its termination 
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extraneous stimulus was presented .5 seconds 
following S; and was either a 500-Hz. tone 
pip, a 6000-Hz. tone pip, or a light flash. 
Presence or absence of a MR to S» produced 
high- and low-CNV conditions. The only sig- 
nificant elevation in EP amplitude resulting 
in the high-CNV condition occurred for the 
500-Hz. extraneous stimulus, which had the 
same frequency as the 1-second CNV-produc- 
ing tone. This finding of a selected increase in 
EP was interpreted as evidence that CNV 
reflects selective attention and as “incon- 
sistent with predictions derived from general 
arousal or response mobilization hypotheses 
| Ellis, 1969, p. 664]." Two other sources of 
information were cited as supporting the at- 
tention hypothesis. First, CNV amplitude and 
RT to S» yielded a significant interindividual 
correlation only when the extraneous stim- 
ulus was 500 Hz. Second, subjects judged 
their level of attention to be significantly 
greater for the high- than for the low-CNV 
condition. 

In a similar experiment, somatosensory EPs 
occurring within and between S;-Ss intervals 
were studied for high-CNV (MR present) and 
minimal CNV (no MR) groups (McAdam, 
1968, 19693). For both groups, averaged 
amplitudes for most components of somato- 
sensory EPs occurring within the S,-S. inter- 
val were of significantly lower voltage than 
for EPs occurring in the intertrial intervals. 
For the high-CNV group, latencies to peaks of 
the late components (after 200 milliseconds) 
were significantly shorter for interstimulus 
EPs than for intertrial EPs. These results were 
interpreted as indicating that “CNV is ac- 
companied by heightened neural excitability 
| McAdam, 1968, p. 286]” and as supporting 
Walter’s hypothesis that CNV is associated 
with “cortical priming" (Walter et al., 1964), 

Finally, no difference was found between 
EPs to light flashes (extraneous stimuli) 
occurring .5 or 1.0 seconds before Sə in con- 
ditions with CNV present (5;-8» present) and 
CNV absent (S2 only) (Donald, 1968, as cited 
bv Karlin, 1970). Furthermore, no relation- 

" found between CNV and EPs to 


ship was i ) 
extraneous stimuli, whether attended to or 


ignored. : 
In summary, the results showing that CNV 
is related to EPs to Sə and EPs occurring 
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Fic. 4. Examples of CNV, motor potential, and Bereitschaftspotential (rendi+ 
ness potential), On the left is CNV (n= 6 trials) recorded from vertex d 
to right mastoid. Relative negativity at the vertex is upward. On the righ 
(upper trace) is a motor potential (7 = 400 responses) associated with dorsi- 


flexion of the right wrist 
meters from midline to a linked ear 
the summation of the rectified EM 
slow negative component “N,” 


Motor Potential" by 


the Elsevier Publishing Company. 


within the S,-S» interval indicate that CNV 
is related to selective attention. 


Bereitschaftspotential and Motor Potentials 

Shortly after th 
(Walter et al., 
reported the a 
of a “slowly j 
tical potential of 10-15 


e initial report on CNV 
1964), Kornhuber and Deecke 


LV [Kornhuber & 
Which begins to rise 


vements and peaks at 
(Walter, 1968). Korn- 
called this potential 
"readiness poten- 
ure 4, the readiness 


1965), and a 


tributed (Deecke et al., 


and recorded from the left Rolandic arca 4 centi- 
reference, The lower trace on the right is 
G resulting from muscle contraction. The 


is the readiness potential. The right side of 
this figure has been adapted from Figure 2 in “ 


H. G. Vaughan, Jr, et al. 
and Clinical Neurophysiology, Volume 25, 1968, by 


Topography of the Human 
pin Electroencephalography 
" courtesy of the author and 


ke 
Deecke, 1965; Kornhuber et al., 1969). m 
CNV, the readiness potential can be ES 
by monetary reward (McAdam & Hm 
1969). It has been suggested that CN phe- 
the readiness potential are separate 3 
nomena since CNV occurs without 2 con" 
(motor response) and does not rut gai 
sistent lateral asymmetry (Cant et al., Jines? 
Despite the usual dependence of the € " 
potential on a MR and its most toD 
pearance above motor areas of cortex, t the 
huber et al. (1969) have concluded tha ^e 
readiness potential is only indirectly 45 
ated with motor processes.!^ 

occi 

16 W 


hen the readiness potential begins t° 
prior 


] 
to spontaneous hand or foot movenes ate 
small EP appears, which Walter (1967) spe uly) 
may be evoked by the decision (internal sogi} 
to make a movement, that is, a literal (physi? stra, 
impulse to action, Walter (1967) has amete d 
that when a television display was prese, ite | 
Second after the MR, the readiness potential pe eo 
until the picture Was shown, and the increasing » A 1 
tive voltage of the readiness potential "instru oC 
computer to present the picture before the E. wh? 
curred. (This Procedure is called “autostart. i 
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Other cerebral potentials related to volun- 
tary motor movements are “motor potentials" 
(Gilden, Vaughan, & Costa, 1966; Vaughan, 
1969; Vaughan et al., 1968). As can be seen 
in Figure 4, a motor potential has four com- 
Ponents. The term “motor potential" has also 
been used to designate the brief negative surge 
Seen before P, in Figure 4 (Becker, Deecke, 
Hoehne, lwase, Kornhuber, & Scheid, 1968; 
Deecke et al., 1969). The early slow negative 
Component of the motor potential shown in 
Figure 4 appears to be the readiness potential 
of Kornhuber and Deecke and was inter- 
preted by Vaughan et al. (1968) to represent 
Preparatory motor set and readiness for move- 
ment. Since the early negative shift of the 
Motor potential resembles CNV and appears 
Without an external warning signal, Gilden 
et al. (1966) suggested that CNV may be 
4n indication of preparation for movement. 
Both CNV (Cohen et al., 1967; Walter, 
1967) and motor potentials (Vaughan et al., 
1968) follow a similar antero-posterior dis- 
tribution, being maximal above the Rolandic 
line and sloping off in amplitude anteriorly 
and posteriorly., However, unlike motor po- 
tentials, CNV has occurred in the absence 
of a motor response and has followed a 
different lateral topography. Furthermore, 
Motor potentials appear maximal over the 
hemisphere contralateral — to movement 
Vaughan et al, 1968), whereas CNV is 
Teported to be bilaterally symmetrical "a 
et aL, 1966; Cohen, 1968; Cohen tes 
1965; Low, Borda, Frost, & Kellaway, 1 . 

At present, there is some confusion eo 
Cetning the similarities and differences 5e- 
tween the CNV of Walter, the readiness e» 
tential of Kornhuber and Deecke, and pat 
carly, slow negative component of the pd 
Potential of Vaughan et al. This n xh 
een produced, in part, by the variety 0 E A" 
used to describe these interrelated slow w 
tentia] phenomena. One possible en pë 

'S definitional problem is to um sedi 
Ween CNV, a motor readiness potent?» 


zin Subject was aware that his € -— 
tpe duced. the picture, he no longer es n S. by 
he R. Subjects also learned to i p n s 
(berating CNV, which replaced the 1 the readi- 
nec ostop"), Walter (1966b, 1967) called 

“SS Potential an “Intention Wave. 


spontaneous slow potentials on the basis of 
experimental operations. The following clas- 
sification of these three types of potentials is 
based on the classical $,-S2-MR situation: 


1. S,-Se-MR paradigm: Where a motor re- 
sponse is required to Ss, both CNV and a 
motor readiness potential occur, and the result 
is a hybrid CNV or CNV complex, 

2. Sı-S2 paradigm: Where only attention 
to S» is required (such as in a perceptual 
discrimination involving Ss) without an im- 
mediate motor response, CNV occurs. 

3. MR paradigm: Where subjects are re- 
quired to make a motor response in the ab- 
sence of external stimuli, a motor readiness 
potential occurs. 

4. Absence of Sı, Sx, and MR: Where 
spontaneous slow negative potentials occur in 
the absence of both external stimuli and a 
discrete motor response, neither CNV nor a 
motor readiness potential is considered to 
have occurred. 


This schema preserves the original con- 
ception of CNV as a slow cerebral potential, 
which is dependent on the "contingency" be- 
tween Sı and Sə (Walter et al., 1964) and 
underscores the fact that CNV can occur with- 
out an overt MR. It is important to note that 
the difference in amplitude between CNV with 
and without a MR (Irwin et al., 1966; 
Straumanis et al., 1969) is of the same order 
of magnitude (10-15 pv) as the readiness 
potential produced by a MR (Kornhuber & 
Deecke, 1965). Undoubtedly, more precise 
definitions and more elaborate nomenclature 
will be possible when more definitive informa- 
tion becomes available on the relative size 
and topographical distribution (lateral and 
anterior-posterior) of CNV and the motor 
readiness potential. In addition, further work 
is needed on the readiness potential, which 
occurs in the absence of an overt MR, for 
example, the “intention wave" of Walter 
(1967). 

In conclusion, CNV and the motor readi- 
ness potential are phenomena that can be 
defined separately by the experimental opera- 
tions used in their development, However, in 
the usual CNV paradigm (S,-S.-MR), they 
occur together and give rise to a hybrid wave 
or “CNV complex.” 
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Autonomic Activity and Related Functions 


'There are significant relationships between 
"changes in CNV amplitude and autonomic 
activity and related functions. This work is 
reviewed here for galvanic skin response 
(GSR), electrocardiogram (EKG) and heart 
rate, respiration, and 
(EMG). 

General autonomic activity. Early informal 
observations indicated that in normal subjects 
as CNV developed into a stable response, 
generalized autonomic imbalance subsided 
and phasic autonomic activity became co- 
ordinated with response to Sə (Walter, 
1966a). However, with anxiety patients and 
in some normal children, these generalized 
autonomic discharges continued and became 
enhanced with stabilization of CNV. When 
these discharges were reduced by tranquilizing 
drugs, there was no CNV change. Delinquents 
with psychopathic personalities showed min- 
imal evidence of CNV and correspondingly 
little autonomic activity during S,-Sj-MR 
trials (Walter, 1964a, 1966a). 

Galvanic skin response. The study of 
GSR as a possible artifact in CNV measure- 
ment has shown that averaged CNV and 
averaged GSR have different time relations 
and morphologies (Low, Borda, Frost, & 
Kellaway, 1966). Whereas the latency of 
CNV is usually between .4 and .5 seconds, 
the onset of GSR is about 1.5 seconds, The 
tise time of CNV is usually below .6 seconds; 
for GSR the rise time is about a second. 
The abrupt return of CNY to base line after 
a to 5 is not seen in the GSR. In ad- 
ten ee pad can md i trials where GSR 
from dices fein xi e 18 Bet 
the cortex Finally. E ^: Pt = ana 

c . Y, a$ CNV increases in 
amplitude with repeated presentations of S,- 
e ees typically declines in amplitude, 

evidence presented by Low 
Borda, Frost, and Kellaway (1966) consti. 
tutes a convincing case against GSR as an 
Ct ain tre e eim e d 
GSR thotedines [^ € Y3 et 
g at CNV is cerebral in 


origin. No work ha: v 
scalp GSR, x has been done on CNV and 


electromyogram 


Heart rate and clectrocardiogram. There is 
evidence that CNV amplitude and heart rate 
(HR) level are inversely related. For in- 
stance, lower CNV amplitude and higher HR 
levels were found in chronic anxiety neurotics 
compared to normal control subjects. (Mc- 
Callum & Walter, 1968b). In a phobic pe 
tient, frontal leucotomy led to larger CNV 
amplitude and lower HR level (a decrease 
from 125 to 90 beats per minute) (W alter, 
1966a). An inverse relationship between CNV 
amplitude and HR level was also found ji 
work on CNV and distraction (Tecce 9 
Scheff, 1969; see also Footnote 9). Both 
phasic distraction (extraneous stimuli pre 
sented before and within the S;-5» interval) 
and sustained distraction (adding by 7s) r€ 
duced CNV amplitude significantly and ele- 
vated HR level significantly. In a normal sub- 
ject who had become bored and cepts 
during testing, CNV suppression was 4 
companied by decreased levels of 3 
(Walter, 1965c). In an experiment on pu 
changes in HR (measured within the een 
interval), CNV amplitude was related to bo ; 
HR acceleration and deceleration (Conno* 
& Lang, 1969). Deceleration of HR was a 
lated to CNV amplitude in an operant a 
sponse situation where subjects responded ? : 
tween 15 and 19 seconds without the presen 
of external S, and S; (Lacey & Lacey, 19 dé 
When subjects were given information 0n ds 
correctness of their performance 4.5 secon 
after responding, CNV was absent 1n 
response-information feedback interval, 
HR deceleration was present." dto 

Overall, CNV amplitude appears relate ‘a 
lowered HR levels and to HR decelerat s 
On the basis of latency and time course; "ein 
has been ruled out as a source of CNV 0"? 
(McCallum & Walter, 1968b). 

Respiration. When CNV occurred 
the inspiratory phase of respiration, oC 
negativity” was enhanced; when CNV ity 
curred during expiration, the negative D? e: n 
of CNV was reduced or reversed (Gulli * 


ree? 
& Darrow, 1967). This relationship betw 


en 
aara aller et 

ag J. I Lacey and B, C. Lacey. Visceral ned P. 
regulation of brain and behavior, Paper prese odi? 


the meeting of the American Psychological 
tion, Miami, September 1970, 
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CNV and respiratory phase was clearest dur- 
ing conditions when subjects were aroused 
by external stimuli and was not clear when 
external stimulation was absent. The authors 
Suggested that heightened activation is needed 
to demonstrate the CN V-respiration relation- 
Ship. In the study of chronic anxiety neurotics 
(McCallum & Walter, 1968b), the lower CNV 
Amplitude found in the patients (who were 
9n drugs) compared to normal controls was 
not accompanied by any group difference in 
respiration rate, 

Electromyogram, In normal subjects, when 
only S; and a MR occurred, EMG recordings 
dicated “a rather massive and prolonged 
Contraction {in the muscles of the responding 
limb], often with increased tonic activity 
tween trials [Walter, 1964a, p. 314].” 
Vith the introduction of Sı as a cue to 
anticipate Sa, CNV developed, tonic muscle 
Activity decreased (between S,-S2-MR trials), 
‘nd EMG bursts became briefer and more 
discretely related to the anticipated MR to 
Ys (Walter, 1964a, 19663). When S» was 
omitted without warning, slight EMG ac- 
Vity sometimes followed Si, suggesting an 
sMticipatory response to Sz (Walter et al., 
964), Similarly, when a MR was made to 
(5 More EMG occurred in the S,-8: interval 
and CNY amplitude was larger) compared 
e Do-response condition (Irwin et al., 1966). 
? a classical conditioning situation involving 
A bell (S) followed in 2 seconds by E 
“thal Command “clench the fist” (Ga), 7 
Subject’s fist clench, anticipatory eer 
Ntials appeared in the Si-S2 mee, a 
EN US, 1964). Liberson (1965) has linke ped 
Ci * activity to CNV and refers to! 
*Pectaney muscle potentials.” ** 


Dote: 


eW ; the fist of 
thei, Vhen subjects are required to clench 


ing the 
Sng, Mnpreferred hand prior to and = Ness 
in «c P lerval, there is a relatively greater, MC 
live arly CN» compared to “late ENNY: guteoned 
Muscle pancement in CNV magnitude a typically 
Ww © tonus was clearest in individuals W s tive am- 
. Type B CNVs and, like the Se*? 


Jy CNV” is re- 
Ine effect, suggests that — of increased 


© ar - kinds - 
seu), Arousa] processes. Other ki (during rest 


E SR. ved 3 
Petiogs activity have been ope a at a time 


When "between blocks of CNV tria to 2 hours 

lta, CN enhancement occurred a . cf. Foot- 

“ote TÉXtro-amphetamine administration; on- 
10, 9 phetamine 4 included € 


A si 
These behavioral changes 


In contrast to the finding of decreased tonic 
muscle activity with CNV development in 
normal subjects, for anxiety neurotics muscle 
tension between trials sometimes increased 
with continued S;-S4-MR presentations. For 
these patients, there was a periodic, gradual 
increase and decrease of EMG activity, rather 
than the coordinated bursts of EMG seen 
in normal individuals (Walter, 1966a). In 
addition, McCallum and Walter (1968b) 
reported that accompanying the finding of 
lower CNV amplitude in chronic anxiety pa- 
tients is a less "efficient? (presumably less 
discrete) EMG response during button press- 
ing to S» by the patients (who were on 
drugs). CNV appears to be independent of 
distortion by muscle activity on the scalp. 
This conclusion is based partly on the find- 
ing of similar CNVs recorded from scalp and 
subcutaneous electrode placements (Walter, 
1965b). 

Available evidence indicates that CNV is 
free of autonomic activity as a source of arti- 
fact, although phase of respiration deserves 
further study in this regard. The finding 
that elevated CNV amplitude was accom- 
panied by decreased generalized autonomic 
activity, lowered HR levels, and decreased 
tonic muscle activity in normal subjects sug- 
gests an inverse relationship between CNV 
amplitude and tonic arousal levels. CNV 
amplitude was related to both lowered phasic 
arousal (HR) and increased phasic arousal 
(EMG and respiration). 


Eye-Movement Potentials 


The most serious methodological problem 
in CNV research is the occurrence on the 
scalp of CNV-like potentials arising from eye 
movements. Several methods have been pro- 
posed to remove these effects. 


tinuous foot-pedaling motions (subjects are i.a. re 
clining chair) ; frequent, repetitive hand movements 
(fingering clothing and the telegraph key, drumming 
with fingers, rubbing hands, and stroking the arm 
of the reclining chair). During this time, subjects re- 
ported a strong and persistent need for movement, 
for example, “I wanted to keep pressing the tele 
graph key” (after CNV trials terminated). Thus, the 
i crease in magnitude of early CNV produced by 
inci tro-amphetamine appears to be mediated, at least 
oe by myogenic activity. 
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Effects of ocular potentials on CNY. Early 
reports indicated that CNV was not related 


to “electrical fields of the eyes or eye muscles 


[Walter, 1965b, p. 47|” and that CNV was 
entirely cerebral in origin (Cohen et al., 
1965). Although horizontal and vertical eye 
movements occurred during CNV acquisition 
trials, polygraph and photographic recordings 
indicated no consistent relationship between 
CNV development and eye movement (Low, 
Borda, Frost, & Kellaway, 1966). Further- 
more, CNV was recorded in an individual 
with no eyeballs and in two Parkinsonian pa- 
tients showing similar CNVs recorded from 
scalp and epidural electrode placements (Low, 
Borda, Frost, & Kellaway, 1966), This 
similarity of CNVs recorded directly from 


alp has been reported 
(1964a, 1965b, 1965c, 
that CNV is not con- 


i racerebral sources as 
ocular potentials (Walter et al., 


he convincing evidence that CNV is a 
genuine electrical phenomenon 

movement-potential artifacts is 
equally compelling data that ey 
can contaminate CNY 


matched by 
€ movements 
measurement:9 The 


on DC potentials, Kohler, Held, 
Connell (1952) demonstrated that EEG base- 
ifts occur 9n the human scalp with 


and latera] eye movements and eyeblinks, 
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Fic. 5. Effects of downward eye movements on the polygraph EEG trace 

on averaged CNV (xn =6 trials). Electrode placements are vertex (Cz) 
and the right mastoid process. Reprinted from “Contingent Negative Variation 
and Individual Differences,” by J. J. Tecce, 


of the author and the American Medical 


Tone Keypress 
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in the Archives of General Psy- 


UC 
two eye movements most destructive to Pul 
measurement are vertical rotations O duce 
eyeballs and eyeblinks (which also prey 
vertical excursions of the eyes). Doyan € 
movements generate electro-oculogram ( we 
waves that are similar to CNV in polarity i ol 
time course and produce CNV-like ave D 
the scalp (Cant et al, 1966; Hilly cll 
Galambos, 1970; Low, Borda, Frost, & s 
away, 1966; Wasman, Morehead, 909) 
Rowland, 1970; Waszak & Obrist, i 
The summation of pseudo-CNV n and 
genuine CNV has occurred in ictu g 1 
eyes-opened conditions, where eyebal d al? 
tions were of comparable occurrence, Arlon 
When normal subjects had difficulty fo 1 0): 
instructions to fixate (Wasman et al., , eye 
This artifactual effect of involuntary , v 
movements (including eyeblinks) | zit a" 
appeared stronger for the nondomins nd 
the dominant eye (Wasman et al., 197 ill 
followed an antero-posterior gradiens, 0): 
yard, 1969a; Hillyard & Galambos, east 
Although eye-movement potentials t tel 
with distance from the eyes, this effect : 30° 
even at the occiput for “voluntary €Y ob 
tions of small magnitude [Rowland; up 
P. 66]." (Rowland also suggested that P er 
ary response may be an additional periP 1t 
actor that contaminates CNV.) Oculat dif 
facts are robust for vertex-mastoid rec? 
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(Hillyard & Galambos, 1970; Straumanis 
et al., 1969), which are the most frequently 
used placements in CNV research. At this site, 
the artifactual effects can be comparable in 
Magnitude, for example, 6.4 pv (Hillyard, 
1969a), to the 5-10 pv differences reported 
as experimental effects (Hillyard & Galambos, 
1970). As can be seen in the upper part of 
Figure 5, a voluntary downward eye move- 
ment resulted in a considerable negative base- 
line shift in the EEG trace recorded from 
Vertex to the right mastoid. With these 
Placements, lateral eye movements to the 
Tight also led to negative base-line shifts, 
although with reduced amplitude. As can be 
Seen in the lower part of Figure 5, CNV 
amplitude was doubled when voluntary 
OWnward eye movements were synchronized 
With the onset of Sı, compared to an eyes- 
Xed condition, A significant within-subjects 
Correlation coefficient has been reported for 
voluntary downward eye movements and 
Scalp-recorded CNV (Connor & Lang, 1969). 
n conclusion, ocular potentials can drastically 
"^e CNV recorded in frontal and central 
Vertex) scalp locations. à 
d of effects of ocular potentials 
tom CNV. Severa] methods have been pro- 
Posed to protect scalp-recorded CNV Ta 
Wr Powerful effects of ocular pup 
Vhere Subjects are capable and experimenta 
to itions permit, the technique ue ae 
as involved the prevention of eye ie 
a visual fixation method. Normal poi m 
Ye successfully maintained visual fixa 


; itical 
8 S target prior to and during aarahi 
SiO. paura s ; revent contante 
tio, < Period in order to prev ls (Rebert & 


n of CNV by ocular potentia E 
«ool 1970; Waszak & Obrist, e dt 
p OUgh there appear to be uu aet kem 
“ences in the ability to maintain a i 
‘ition (Wasman et al., 1970). A tp others 
i Junct to this technique, as well cod the 
effi record vertical EOG as a mon! bs con- 
Sister’ of subjects’ efforts to fixate, 9! ossible 
Wit, Visual fixation may ed ba et al., 
195, 2Me norma] individuals (W “on of sub- 
leet. and with special Rm (Strau- 

S s chi a 
This et as een children IT 
“Myer, & Kendall, 1969). The sod 
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mits the use of a second method for con- 
trolling ocular contamination of CNV, 
namely, the off-line exclusion from data 
analysis of all trials where eye movements 
occur. This technique can be accomplished 
either by “eyeball” selection of trials having 
eye movements (Waszak & Obrist, 1969) or 
by automatic rejection of these trials (Small 
& Small, 1970). This procedure has the dis- 
advantage of data selection and possible loss 
of relevant information. For example, if eye 
movements are related to disturbances in 
attentional processes, then exclusion of these 
eye-movement trials would result in a loss of 
psychologically meaningful information of 
possible relevance to CNV development. 

A third method for controlling eye-move- 
ment effects on CNV involves the use of a 
potentiometer to subtract ocular potentials 
reflected in anterior areas from ocular effects 
on CNV recorded at vertex (Walter, 1967). 
This procedure has received both positive 
(Hillyard & Galambos, 1970) and negative 
(Wasman et al., 1970) evaluations. Wasman 
et al. (1970) used this technique unsuccess- 
fully, partly because subtracting larger frontal 
responses (which reflected large EOG po- 
tentials) from smaller vertex responses yielded 
positivity of CNV. : 

A fourth, off-line technique for controlling 
eve-movement artifacts involves the develop- 
ment of a regression function for voluntary 
eye movements and artifactual effects on CNV 
and the application of the regression equation 
to experimental situations involving involun- 
tary eye movements (Hillyard & Galambos, 
1970). This procedure permits the analysis of 
all CNV trials, with and without eye move- 
ments, and provides a quantified estimate of 
ocular artifacts. However, it requires con- 
siderable time and effort and must be done 
separately for each subject (Wasman et al., 
gy cmm available evidence clearly im- 
plicates eye movements as the béte noire of 
CNV research. CNV is a cerebral phenomenon 
that occurs in the absence of eye movements, 
but it is easily affected by ocular potentials, 
Of particular concern are eyeblinks and slow 
otentials resulting from other vertica] 


scalp p! ts. These potentials, especially 


eve movemen 
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when they occur within the S,;-Ss interval, 
obscure the accurate measurement of CNV. 
The sensitivity of CNV recordings to distor- 
tion by eye movements suggests that pre- 
sentation of averaged CNVs be accompanied 
by averaged EOGs. Under normal conditions, 
the combined use of visual fixation training 
and exclusion of trials with eye movements 
results in satisfactory control of ocular con- 
tamination of CNV measurement. 


CNV AND ĪNDIVIDUAL DIFFERENCES 


As in work on sensory evoked potentials 
(Tecce, 1970), interindividual variation in 
CNV development is considerable and perva- 
sive (Tecce, 1971). This unaccounted for 
variance has been related to a variety of in- 
dividual difference variables, which can be 
classified according to the populations of sub- 
jects employed, namely, normal adults, age 
groups, psychopathological groups, and patho- 
Physiological groups. 


Normal Adults 


Significant negative correlations have been 
reported between CNV amplitude and ques- 
tionnaire scores of anxiety, obsessionalism, 
and depression (McCallum & Walter, 1968a), 
but not for CNV morphology 
psychopathology or obsessional ar 
fined by psychometric tests (B 
1967), In other work on anxiety (Knott & 
Irwin, 1967, 1968), high scorers on the 
Taylor Manifest Anxiety Scale (MAS) were 
expected to have higher CNV amplitude than 
low Scorers, based on assumptions from 
Spence's behavior theory that the two anxiety 
groups differ in drive level (Spence, 1958: 
Taylor, 1956), No difference in CNY ampli- 
tude was found between the high- and low- 
anxiety groups in conditions where a MR 
to Sə was required, where no MR to S, was 
Tequired, and where So was a low-intensity 
shock, However, where high (painful) shock 
could be anticipated as Sə, CNV amplitude 
was lower for high-anxiety subjects than for 
low-anxiety subjects, This unpredicted find- 
ing was interpreted as due to the high-anxiety 
subjects having “higher base-line cortical 

a ceiling to be 


negativity,” which caused 
reached more quickly by them than by the 


and general 
nxiety, as de- 
ostem et al., 


low-anxiety subjects (Knott & Irwin, 1967, 
1968). It is also possible that during the 
high-shock condition, high-anxious subjects 
were highly distracted by worry and pre- 
occupation with task-irrelevant, gna 
thoughts (Tecce, 1971). Taylor, author 0 
the MAS, and Child (1954) have pointed out 
that high-MAS scorers are vulnerable to dis- 
traction. 


CNV and Age Differences 
The development of CNV is related to age 
differences and special children groups 
Walter has provided a cursory description 0 | 
the development of CNV between the ages 
of 3 and 21 years (Walter, 1964a, pes 
1966a). With conventional stimuli, CN 
showed a fragile existence between the age 
of 3 and 7 and was highly dependent 0? 
encouragement and  reassurance (Walters 
1964a, 1966a). On the other hand, by w^ 
interesting stimuli Gullickson (1970) y* 
tained robust CNVs (larger than in adults) 5 
2- and 3-year-olds without the use of a M 4 
Stimuli were a glide tone (a tone of changin 
frequencies) of 1 second as S, and eithe 
novel color patterns, novel sound pattern" 
or a combination of the two presented for d l 
seconds as S». Subjects receiving combine, 
color and sound as EA were “more attenti", 
and had larger CNVs than subjects receivit 
either one alone. (Eye-movement potenti 
were monitored.) Large positive slow p 


à ; Paria ] ex 

tentials occurred during habituation anc 

tinction trials (S, only). walte! 
5 : 3 ; Wa 
Further informal observations by 5 


(1964a, 1966a) suggested that from 8 jo? 
years of age, encouragement and motiva og 
by competition can engender adult-lo0 "^ h 
CNV. At age 20, CNV was stable, but Y 
Some individuals depended on were y 
couragement, Tn a systematic study of sed 
year-olds, CNV recorded at vertex inte 
in amplitude up to the age of 15 an co” 
smaller in children than in adults.2! I” 


w 


" inta 
7^ J. A. Taylor. Manifest anxiety, pat sit 
rence, and repression, Paper read at a SY paves 
on “Experimental Foundations of Clinical * pf 


fe 


ogy.” University of Virginia Medical School ' 
1959, E n 

7 J. Cohen, The development of the COP papt 
negat 


Ive variation with expectancy in children- 
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trast, Low, Borda, Frost, and Kellaway 
(1966) reported high CNVs in children of 
ages 4, 8, and 11 and concluded that CNV 
amplitude is larger in children than in adults 
(Low, Borda, Frost, Kellaway, & Gol, 1965). 
These discrepant findings may be due, in part, 
to lack of control of eye-movement potentials. 
_ The topography of CNV was found to differ 
In children and adults. In contrast to adults, 
in whom CNV amplitude was greater in 
Anterior than in posterior areas, in children 
Up to 12 years of age CNV amplitude was 
found smaller in frontal than in parietal areas. 
The adult pattern of larger CNV in frontal 
areas emerged after age 12, primarily because 
of an increase in frontal CNV. As with adults, 
CNV in children appeared bilaterally sym- 
Metrical over the hemispheres (Cohen, see 
Footnote 21). Tn contrast to adults, in whom 
NV typically declined abruptly to base line 
"Don response to S; (see Figure 2), children 
Show a gradual termination, a characteristic 
interpreted as an index of incomplete ma- 
turational development (Cohen et al., 1965). 
Valter has observed that the small and in- 
consistent CNV of children was increased by 
Persuasion, instruction, admonition, and 
Competition [Walter, 1964b, p. 434]." The 
“lticacy of these social factors was first seen 1n 
Ne frontal areas. Since children are notori- 
ously distractible, elevation in CNV amplitude 
JY social reinforcement may be due primarily 
to increased attention to the experimental 
48k. This view is supported by Gullickson s 
(1970) demonstration of large CNVs in p 
Schoo] children through the use of ais 
Stimuli and by the finding of CNN dis 
@Ppearance when a 6- or 8-year-old child w E 
Stimulated with peripheral shock e s 
si Sz interval (Fenelon, 1968). ec den 
“ock technique was intended to adc : 
Plus arousal” to the child’s activation state, 
Was most likely very distracting 25 well. 5 
Amplitude of CNV increases with -— e 
? the late teens and in children e dm 
e Sensitive to change by social stimu. à 
Ntional ppear to play an "7 
s NN Arsen — 
Society for pe 


processes a] 
bri ——————— m 
Payee at the meeting of Hae 
logg "logical Research, Washington, 


portant role in CNV development in children, 
although other organismic changes that take 
place during physiological maturation are 
most likely important. Further work with 
control of eye-movement potentials is needed 
to provide normative data on CNV in chil- 
dren. The use of attractive stimuli and per- 
formance-relevant rewards (e.g., candy or 
money for fast responses to Ss) should 
facilitate CNV development. 


Psychopathological Groups 


The study of CNV in psychiatric patients 
has basically involved neurotic and psychotic 
patients, The four types of neurotic patients 
studied are chronic anxiety neurotics, phobic 
patients, obsessive-compulsive individuals, 
and hysterics. Amplitude of CNV was re- 
ported significantly lower for high-anxiety 
chronic neurotics then for normals (Mc- 
Callum & Walter, 1968a, 1968b); the pa- 
tients (on drugs) also showed more CNV 
reduction from distraction (McCallum, 
1967). Both anxiety neurotics and phobic pa- 
tients showed complete disappearance of 
CNV after a few omissions of Sə (equivoca- 
tion) (Walter, 1964a, 1966a). The anxiety 
patients required many (sometimes hundreds) 
of trials for CNV to return (Walter, 1965b). 
One interpretation of the quick disappearance 
and persistent absence of CNV involves dis- 
traction from covert verbalizations that ac- 
company worry and fear (Tecce, 1971). 

Early observations suggested that CNV 
amplitude was higher in obsessive-compulsive 
patients than in normals (McCallum & 
Walter, 1968b). Other studies have indicated 
no significant difference between obsessionals 
or hysterics and normal controls (Dongier & 
Bostem, 1967; Timsit, Koninckx, Dargent, 
Fontaine, & Dongier, 1970), although CNV 
amplitude was more often absent (Dongier 
& Bostem, 1967) and was significantly higher 
(Timsit et al, 1970) in hysterics than in 
obsessives. An important finding by Timsit 
et al. was that excessive prolongation of the 
terminating or descending (positive-going) 
limb of CNV (slower resolution of CNV) 
was related to extent of Psychopathology, 
That is, prolonged CNV occurred significantly 
more often among psychotics (including 
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schizophrenics) than among obsessionals, and 
significantly more often among obsessionals 
than among normal controls. Since a tend- 
ency for slow CNV return to base line has 
also been found in children, it is tempting to 
speculate about mechanisms in children and 
psychopathological group that may be related 
to CNV, for example, defective attentional 
processes. Timsit et al. also found no differ- 
ence in CNV amplitude between the psy- 
chiatric patients and normal controls, whereas 
McCallum and Walter (1968a) found lower 
CNV amplitude in schizophrenics. In this 
latter report, schizoid individuals gave CNVs 
of average magnitude and high variability in 
development. Meager evidence of CNV was 
reported for autistic children (Walter, 1966a) 
and for recidivist delinquents with psycho- 
pathic personalities (Walter, 1964a, 1967) 
Further work is needed for the proper 
evaluation of CNV development in psycho- 
pathological groups with careful monitoring of 
eye movements (Straumanis et al., 1969). One 
promising measure is the use of CNV pro- 
longation as an indication of extent of psy- 
chopathology (Timsit et al., 1970). 


Pathophysiological Groups 


CNV has been studiec 


l in dyslexic children, 
visually deficient chi 


Idren, and epileptics. In 
dyslexic children, normal CNV appeared 
when S; and S, were light and sound, re- 
spectively (Cohen et al., 1965), but CNV 
tended to disappear when Sə was a visually 
presented word or a consonant-vowel-con- 
sonant trigram (Fenelon, 1968). In the latter 
omms CNV deficit was found in a 

“year-old, but not i - z 
nounced, long-lasti d d ites fie 
CNV 


2 Was a visual 
l., 1965). Th 
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(particularly for visual stimuli). In one pec 
mal epileptic, CNV was recorded, and re- 
sponses to Sz were unimpaired during phobic- 
induced spike and wave activity (Winter, - 
1967). In a more extensive study of 12 pia 
trencephalic epileptics, spontaneous spike an 
wave discharges made it difficult to evaluate 
changes in CNV (Zappoli et al., 1969). bos 
disorganization of operant responses during 
these generalized paroxysmal discharges 
prompted the authors to speculate abon 
CNV disruption. In 3 of 11 patients with 
focal epileptic discharges of temporal lobe 
origin, there was sometimes “transitory E 
pression or considerable disorganization P 
CNV [Zappoli et al, 1969, p. 663]," par 
ticularly in the hemisphere not involved 2 
the epileptic discharges, Reduction in a 
amplitude was found in a group of 20 gut 5 
with various brain lesions, particularly on the 
side of the head near the lesion (McCallum, 
Walter, Winter, Scotton, & Cummins, 1970s 

The study of CNV and individual diffe 
variables has not resulted in a consistent ng | 
of findings, partly because of a e 
control eye-movement potentials. Werer i 
less, this work has provided a promising stê 1 
toward the establishment of CNV as a use " i 
clinical tool, especially in subjects m 
impaired attention and arousal mechanis! 
(e.g., schizophrenics). 


z ; CNV 
NrUROPHYsIOLOGICAL Grnrsis or CN 


sems 

The question of what brain mechani 
are engaged in the origin and developmen ny 
CNV has not yet been answered with dati 
degree of specificity, Nevertheless, impor “on 
beginnings have been made in work a 
direct recordings of CNV in man and anim 


F : rtic? 
and in experiments on other slow CO 
potentials, 


Cortical CNV in Man «ons 

As previously discussed, the observat? of 
of Walter and his associates on recording alp 
CNV in the human cortex indicated that T 
CNV represents a composite electrica wel! 
tential based on frontal areas of cortex wus 
as central regions (Walter, 1968). ^ re" 
Parison of subdural (cortical) and sc m 
Cordings indicated that CNV spread 
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anterior frontal cortex posteriorly to the 
premotor area in approximately 1 second. In 
the nonspecific areas of the frontal lobes, 
CNV was electronegative with respect to 
other areas of the cortex, depth locations of 
the brain, and mastoid processes (Walter 
et al., 1964, 1967). Walter's conclusion from 
Scalp and direct cortical recordings suggested 
that CNV “is due to widespread depolarization 
of the apical dendrites in the feltwork of the 
Upper layers of the frontal cortex [Walter, 
1968, p. 373].” 


CNV in Animals 


The first demonstration of CNV in animals 
Was carried out in rhesus monkeys who were 
trained to lever-press to escape unavoidable 
Shock (S,) following a click or tone (Si) 

Low, Borda, Frost, & Kellaway, 1965; Low, 
Borda, & Kellaway, 1966). When the shock 
Was either unavoidable or inescapable with 
No discrete MR to Ss possible, CNV dropped 
9ut (Low, Borda, & Kellaway, 1966). CNV 
also appeared if fast responses to a non- 
nocent S, were rewarded by appetitive re- 
forcement (dextrose pellets) (Low, Borda, 

Kellaway, 1966). In an apparent demon- 
Stration of CNV in dogs, Kamp, van Rijn, 
and Zwart (1969) reported a slow negative 
Shift prior to the animal’s pressing a pedal 
.9T food reward and another shift in the 
interval between the MR and actual delivery 

food, The first potential was regarded F 
the readiness potential of Kornhuber ke 

Cecke (1965); the second potential was 
Considered to reffect expectancy and was im- 
Dlieq to be CNV. s" 

n another appetitive reinforcement — 

ent, which employed the Si-S MR e 
“igm of human CNV studies, Borda (1 2 
1970 demonstrated in rhesus monkeys two 
‘pes of slow negative potentials, one domi- 
nant in frontal areas of the cortex and xd 

inant in central regions. The negativity 


ring cor- 
Te ese potentials was elevated "ex 
(h Performance independent din terms 
or Ber) level, a finding interprete 


tention to task cues. The go etri 

ang Potential decreased during peo gen- 

= Was interpreted as possibly bemg such 
“ted by a diffuse subcortical system, 


as the midbrain reticular formation. On the 
other hand, the frontal-dominant potential 
did not decrease as long as performance level 
was high and was interpreted as more analo- 
gous to CNV as recorded in man and as reflect- 
ing “a basic mechanism of selective attention 
subserved by non-specific thalamo-cortical 
pathways [ Borda, 1970, p. 179].” Borda sug- 
gested that human CNV is not a unitary po- 
tential; may represent the sum of potentials 
of different subcortical origins; and may con- 
sist of several waves, including one each for 
attention, arousal, and motor activity.?? 
Several recent animal studies have led to 
important information about the cortical and 
subcortical distribution of CNV. For instance, 
Low (1969) utilized a click-tone procedure 
with rhesus monkeys (where a fast lever press 
to tone avoided shock) to demonstrate de 
amplitude CNV-like waves in frontal to 
parietal and frontal to sensory motor cortex 
placements. These waves were small in a 
sensory motor to parietal cortex recording. 
In addition, waves in the anterior areas had 
an early rise time. These findings confirm 
the prominence of CNV in anterior regions of 
the cortex. In a recent study, Rebert 2 
used an appetitive reinforcement situation in- 
volving a continuous tone (Si), light (Sə), 
and a bar press (MR) to demonstrate CNV- 
like waves in cortical and subcortical areas 
of monkeys. An important finding was the 
appearance of large negative potentials with 
a fast rise time at midline thalamus, In con- 
trast, positive waves appeared at caudate 
nucleus, and no slow potentials of any signifi- 
cance were recorded from cerebral white 
matter, corpus callosum, pulvinar, hippo- 
campus, or pyriform cortex. Finally, place- 
ments (not yet histologically verified), aimed 
at the midbrain reticular formation and 
lateral hypothalamus, produced fast-rising 
negative and positive potentials, respectively. 
Negative waves were also recorded from 
motor and premotor cortex. These prelimi- 
nary results suggest involvement of both 
thalamic structures and the reticular forma- 


22 R. P. Borda, personal communication, May 21. 


E S. Rebert, personal communication, January 


29, 1971. 


S 


tion in the development of cortical CNV. In 
another appetitive conditioning study, Don- 
chin ** and co-workers trained young rhesus 
monkeys either in a S,-S»-MR situation or 
in a task where response to S; was a key press 
and response to S» was release of the key. In 
the latter situation, where the key was held 
down during the interstimulus interval, a 
positive-negative-positive waveform appeared 
prominently over postcentral cortex and, with 
reduced amplitude, over precentral motor 
cortex. Negative waves were not clearly seen 
in the frontal placement. On the other hand, 
in the first experimental situation (paradigm 
for CNV in man) sustained negativity similar 
to human CNV was seen in the frontal place- 
ment in the S,-S, interval, and postcentral 
negativity was delayed until the time of the 
MR to Sə. These findings indicate that CNV 
in monkeys is maximal in precentral regions 
of the cortex and that motor activity can 
produce complex slow potentials in post- 
central cortex, 

These experiments on animals suggest that 
scalp CNV in man is a heterogeneous wave 
complex that is derived from frontal and 
central areas of cortex, Subcortical generator 
mechanisms seem to include both the reticular 
formation and nonspecific thalamic structures, 


Other Slow Cortical Potentials in Animals 


The study of slow cortical potentials (other 
than CNV) in animals consists of a large and 
complex body of findings. Several comprehen- 
Slve reviews of this work are available (Bra- 
Zier, 1963; O'Lear 


y& Goldring, 1964; Row- 
land, 1968), and only general findings germane 
to CNV work a 


rk are discussed here, One im- 
portant finding in this animal work is that 
slow cortical potentials are mediated by the 


reticular formation (e.g., Arduini, 1958. 
Caspers, 1963). Another general finding is 
that thalamic nuclei are also involved in the 
genesis of slow cortical potentials, which can 
have an appearance (based on latera] asym- 
metry) different from those related to the 
midbrain reticular formation (Arduini, 1958). 


"e Donchin, D. Otto, L. K. Gerbrandt, and 
- H. Pribram, While a monkey waits. Paper pre- 


pee at the meeting of the Western Psychological 
sociation, Los Angeles, April 1970. 


JOSEPH J. TECCE 


Regarding these two subcortical structures, 
Lindsley (1960) has emphasized the differ- 
ence between the tonic or longer lasting 
arousal effects of the ascending portion of the 
midbrain reticular formation and the more 
transient arousal effects of the diffuse thalamic 
projection system. Similarly, Rowland dis- 
tinguished between shifts in steady potentials 
that terminate relative to an evocative stimulus 
and that may be generated by "specific sys- 
tem projections in cortex" versus steady po- 
tential shifts that outlast the stimuli and that 
may be generated by “diffuse activation sys- 
tem projection [Rowland, 1968, p. 54]- 
These views give rise to the interesting ques- 
tion of what relationship, if any, CNV 
(usually stimulus-locked and short-lasting) 
bears to subcortical systems assumed to have 
transient (phasic) and persisting (tonic) 
arousal effects. 

Further specification of what brain mecha- 
nisms are involved in CNV development might 
be facilitated by the use of pharmacological 
agents. There is evidence of drug effects 0n 
steady potentials in animals (O’Leary & Gold- 
ring, 1964), For example, negative shifts i? 
the EEG base line have been recorded from 
cat cortex following administration of amphet- 
amine (Norton & Jewett, 1965), procaine 
(Goldring, Metcalf, Huang, Shields, $ 
O'Leary, 1959), and chlorpromazine (Norton 
& Jewett, 1965), and positive base-line shifts 
have been associated with the use of P 
tobarbital (Goldring et al., 1959; Goldring " 
O'Leary, 1960), thiopental (Norton & Jewett 
1965), and benadryl (Goldring et al., 1959); 
At present, very little data are available 2 
the relationship between CNV and psych 
active drugs, al 

In contrast to most work on slow contio 
potentials, which emphasizes both arous’ 


samé: 
processes and — subcortical mechanism” 
Roitbak (1968) has suggested that mei 


potentials are due to dep 
dendrites in the cortex 

hibition of Pyramidal neu 
the basis for cortical 


There are experimental 
this view, 


olarization of ape 
and presynapUC ,, 
irons, which mes $ 
inhibitory proces’ yt 
findings to SUPP, 4 
for instance, the demonstra? t 
that slow Cortical negative potentials in ol 
were related to decreased firing rate? 
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cortical neurons (Fromm & Bond, 1964). In 
a discussion of Roitbak's emphasis on cortical 
inhibitory processes, Walter (1965a) has 
Speculated that CNV might reflect an inhibi- 
tory function since motor restraint occurs 
during the S,-S» interval up to the MR to Ss. 
Walter described this inhibitory process as 
analogous to an alarm clock that is set and 
delays (inhibits?) its waking effect until the 
Set time when it goes off. In the same discus- 
Sion, Walter rejected the usefulness of this 
“paradoxical” use of inhibition and implied 
that terms like “excitation” and “inhibition” 
are too simple to explain complex brain 
Processes involved in CNV development. An 
additional thesis proposed by Walter is that 
the frontal lobes, where CNV occurs, are the 
brain sites where the processes of selection 
(attention) and discrimination occur (Walter, 
19652). Support for his proposal is available 
from ablation studies in monkeys (Malmo, 
1942: Wade, 1947) and in baboons (Pribram, 
1950), which indicated that bilateral removal 
9r lobotomy of the frontal lobes can produce 
ighly distractible animals who have diffi- 
culty attending to the experimental task. 
n addition, neurological findings have in- 
dicated that patients with frontal lobe dam- 
age appear to be characterized by disturb- 
ances of both attention and arousal - In a 
recent review of frontal lobe work, Livingston 
(1969) concluded that selected areas of 
Medial and orbital frontal cortex are related 
to emotional arousal. Thus, the frontal a 
°! the brain, where CNV appears prominently, 
"DDear to he significantly involved in atten- 
ton and arousal functions. 

n summary, CNV occurs pro 


t : vane of cortex and ap- 
he frontal and central areas ol of 


“ats to emanate from k ipi itl 
Apica] dendrites in the upper layers o gas to 
Tex, The development of CNV i volved 
lepeng On subcortical mechanisms reticular 
arousal functions, such as the Te 
mation and thalamic structures. 
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terms of four theoretical concepts, namely, 
expectancy, conation, motivation, and atten- 
tion. The four hypotheses based on these 
concepts are evaluated below, and a 
process theoretical model involving at 
and arousal functions is proposed to a 
for CNV findings, 


Vo- 


Expectancy Hypothesis 


In simplified form, this hypothesis states 
that CNV amplitude varies directly with the 
subjective probability or expectancy that Sə 
will follow S;. According to this view, Sə with- 
drawal in the equivocation and extinction pro- 
cedures reduced the relative certainty or ex- 
pectancy that S» would follow S; and, there- 
fore, reduced CNV amplitude. For Water ( 
the human brain, especially the frontal lobes 
where CNV is prominent, functions like “a 
computer of probability, or more properly 
contingency [Walter, 1965a, p. 4].” In view 
of the many changes in CNV amplitude re- 
ported where the statistical association of S, 
and S» was unchanged, the expectancy hy- 
pothesis, though important in early CNV 
work, is an incomplete ac of CNV 
findings. 


Conation H vpothesis 4 


There is considerable evidence from work 
on response variables that conation, or the 
intention to perform an act, is a main deter- 
minant of CNV development (Low, Borda, 
Frost, & Kellaway, 1966). For instance, a 
highly reliable effect is the elevation of CNV 
amplitude accompanying the occurrence of & 
MR to Ss. In addition, CNV magnitude was 
found directly proportional to the amount of 
anticipated force needed for a MR to S» and 
was larger when the MR effectively termi- 
nated S», compared to when the MR had no 
such effect. Additional support for the Te- 
sponse-intention or response-preparation view 
ds found in the Sidman avoidance situation, 
where CNV-like waves appeared before the 
subject’s MR without the benefit of externa] 
sienals like Sı and S» (Low, Borda, Frost, & 
Kellaway, 1966), although, as previously 
noted, this paradigm lends itself to the use 
af internal cues as Sı and S». These findings 
have been interpreted in terms of conation 
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theoretical model proposed to account for CNV 
changes. Attention is positively and monotonically 
related to CNV magnitude. Arousal level is non- 


monotonically (inverted U) related to CNV mag- 
nitude. 


(the intention or conscious drive to perform 
a voluntary action) and preparation set, and 
CNV has been conceptualized as conative 
negative variation, with Walter’s original 
acronym being retained (Low, Borda, Frost, 
& Kellaway, 1966). These results provide 
convincing evidence that response intention 
is important in CNV development. On the 
other hand, there are findings that do not 
fit this view, For instance, both the appear- 
ance of CNV and changes in CNV amplitude 
have been reported in the absence of a MR 
to S». Tn addition, systematic changes in CNV 
amplitude were reported in work on stimulus 
factors without apparent changes in the MR. 


Motivation Hypothesis 


The findings of several ex 


Suggested that CNV development is related 
to motivation as ¢ 


ot onceptualized in the Hullian 
tradition (Hull, 1951). This view proposes 
that CNV amplitude varies directly with the 
subject’s level of motivation and is supported 
by a variety of results, such as increased 
5 amplitude accompanying both anticipa- 
on of increased muscular effort and increased 
effort to detect S». A number of other results 
can be reinterpreted as supporting the motiva- 
tion view, for instance, elevated CNV ampli- 
oth with a MR to S» and 


periments have 
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EP increases that are associated with CNV 
occurrence does not fit a general motivation 
view. In the sense that motivation has been 
conceptualized as having both dynamogenic 
and directional functions (Cofer & Appley, 
1964; Hebb, 1949, 1955; Young, 1961), it in- 
cludes aspects of both arousal and attention 
processes and might suitably be replaced by 
these concepts. 

In conclusion, hypotheses based on ex- 
pectancy, conation, and motivation account 
for many important findings on CNV, al- 
though there are some data that do not fit 
these views. 


-1 Proposed Theoretical Model 


Recent suggestions have been made that 
CNV development is related to two separate 
but related psychological processes—attention 
and arousal (Tecce, 1970, 1971). Based on 
the present review, a two-process explanation 
is presented to account for CNV findings. 

T. Attention Hypothesis: The magnitude of 
CNV bears a positive monotonic relation to 
attention, The clearest relationship that 
emerges from studies of CNV is that CNV 
magnitude bears a positive, monotonic rela- 
tion to attention to the experimental task. 
This hypothesis is presented schematically in 
the left half of Figure 6 (no assumption of 
linearity is intended), Attention is considered 
a hypothetical organismic process, character- 
ized by steering. functions, which facilitates 
the selection of relevant stimuli from the 
environment (internal or external) to the 
exclusion of other stimuli and results in es 
sponse to the relevant stimuli, The selective 
Processing of information in Sẹ is a ke) 


x e 
property of the attentional process. 
definition of attention in terms of cue or €! 
rectional 


properties has been previously made 
(e.g., Berlyne, 1969, 1970; Hebb, 1958). 
The attention hypothesis has been SUP” 
Ported by a number of experiments that Br 
demonstrateq significant relationships a» 
tween CNV and variables involving chang? 
in attention to S, One group of highly reli- 
able findings involves the reduction in C 
amplitude by distraction both discre 
(phasic) ang sustained (McCallum & walter 
1968: Tecce & Scheff, 1969), This “distr 
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tion effect” is clear-cut when there is objective 
evidence that (a) processing of information 
contained in the distracting stimulus has oc- 
curred (e.g., distracting stimuli can be remem- 
bered); and (b) processing of information 
Contained in Ss has been impaired (e.g. 
RT to S. is lengthened). In addition to the 
distraction effect, there are other results that 
Support the attention hypothesis. For in- 
Stance, elevated CNV was found when detec- 
tion of a barely audible Sə was required (Low 
et al, 1967; Rebert et al, 1967) and when 
Practice with S,-Ss pairs preceded CNV 
development (Hillyard & Galambos, 1967). 
Increased CNV amplitude was also found in 
Situations where response requirements pre- 
Sumably heightened attention to S», such as 
When a MR to S» occurred (compared to no 
MR) (e.g., Low, Borda, Frost, & Kellaway, 
1966; Straumanis et al., 1969; Walter et al., 
1964); when the MR to Sz was instrumental 
(compared to when it had no effect on Sz) 
Peters et al, 1970); and when speed of 
response to Ss was fast (Connor & Lang, 
1969; Lacey & Lacey, 1970; Tecce & Scheff, 
1969; Waszak & Obrist, 1969). The findings 
of increased amplitude in selected EPs oF 
curring with CNV appearance (where sini. 
9r producing the EPs and CNV were similar) 
(Ellis, 1969) and a relationship between C) 2 
and a late component of EPs (P300) et 
Donchin & Smith, 1970; Hillyard, 1969 4 
ave also been associated with attentio 


Unctions. v" 
Other studies based on less uu 
results have shown (4) elevated C zm bs 
"de and enhanced attention to Se W e 
Use of attractive, novel, and complex Lucre 
instructions for subjects to an 
ard on S, (McCallum & Walter, Tip ais. 
e reduced CNV amplitude in di ed E 
timination tasks which were wg zh 
“gendering distraction (Dase ot plitude 
in note 7); and (c) reduced CN = pm 
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Practerized by ease of dis 3 
üs Children, highly anxious d 
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71), This body of findings m i 
attention to S, is a primary aee of 
Correlate of CNV development. 


tractibilit 


eii ty 
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the increased CNV amplitude found to ac- 
company anticipation of high muscular effort 
for response to Sə, it is necessary to con- 
sider attention to motor as well as to sensory 
(S2) aspects of the task in the genesis of 
CNV. The importance of the balance between 
sensory and motor sets in constant-foreperiod 
RT experiments (Woodworth & Schlosberg, 
1954) suggests a variable of possible import- 
ance for further evaluation of the attention 
hypothesis. 

Two lines of reasoning suggest that any 
explanation of CNV findings would be in- 
complete without a consideration of arousal 
processes. First, the work on CNV and other 
slow cortical potentials in animals has in- 
volved subcortical mechanisms associated with 
arousal functions, such as the reticular forma- 
tion and the thalamus. Second, the con- 
stant-foreperiod RT paradigm, which is gen- 
erally used to produce CNV, involves the 
predictable occurrence of a stimulus event 
(S2) and, therefore, the likelihood of in- 
creased phasic arousal (within the S,-S, 
interval) in anticipation of that event, This 
point has been emphasized in work on sensory 
evoked potentials (Näätänen, 1967). 

In the present discussion, arousal is con- 
sidered devoid of the steering properties of 
attention and is conceptualized as a hypo- 
thetical process that energizes behavior un- 
selectively and affects only intensity of re- 
sponse. This emphasis on dynamogenic prop- 
erties in the definition of arousal has been 
previously made (Berlyne, 1960, 1969, 1970; 
Duffy, 1957, 1962; Freeman, 1948; Hebb, 
1955, 1958; Lindsley, 1957, 1960; Malmo, 
1957, 1958, 1959). The concept of arousal 
has been used synonymously with terms like 
activation, excitation, and energy mobiliza- 
tion (Duffy, 1962), and no differentiation 
between these concepts is intended here. The 
distinction between attention and arousal on 
the basis of directional or steering properties 
and energizing or dynamogenic Properties, 
respectively, is consistent with the views of 
both motivation and activation theorists, For 
example, in two major works on drive theory, 
Brown (1953, 1961) specifically excluded 
jonal properties from the concept of 


irect «o o 
ps Similarly, activation theorists have 


drive. 
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emphasized the direct exclusion of selection 
or steering functions from arousal (Duffy, 
1957, 1962; Hebb, 1955, 1958; Malmo, 1957, 
1958, 1959). Lindsley has differentiated 
"general alerting or readiness and specific 
alerting or focused attentiveness [Lindsley, 
1957, p. 73]," the latter being more con- 
cerned with selection of stimuli. Based on 
these distinctions, a second hypothesis based 
on the concept of arousal is proposed to ac- 
count for CNV findings. 

IL Arousal Hypothesis: The magnitude oj 
CNV bears a nonmonotonic (inverted-U) 
relation to arousal level. There are two groups 
of CNV findings that, considered together, 
support the view that CNV magnitude bears 
a complex, inverted-U relation to arousal 
level. In the first category are those findings 
that show increased CNV magnitude to be 
associated with increased attention to the ex- 
perimental task, Experimental conditions that 
have been interpreted as producing height- 
ened attentiveness to S. could also be in- 
ferred to reflect increased phasic arousal 
Occurring within the S,-S. interval, for in- 
stance, detection of barely audible stimuli as 
S: and the presence of a MR to Ss compared 
to its absence. Consequently, many of the 
findings that fit the attention hypothesis also 
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group of findings suggests an inverse rela- 
tionship between CNV magnitude | and 
changes in arousal level. This association is 
reflected in the descending (right-half) por- 
tion of the inverted-U curve in Figure 6. 
Considered together, these two groups of 
findings indicate that CNV magnitude bears 
a complex, nonmonotonic relation to arousal. 
This ubiquitous U curve has been previously 
described for the relation of arousal to be- 
havior (Duffy, 1941, 1957; Freeman, 1948; 
Hebb, 1955; Malmo, 1959; Yerkes & Dodson, 
1908). 
There are several possible difficulties with 
the arousal hypothesis as expressed in an in- 
verted-U function. One problem is that it 
can explain both positive and negative rela- 
tionships by its ascending and descending por- 
tions, respectively; consequently, it lends 
itself to a loose usage by the arbitrary desig- 
nation of arousal levels aíter experimental 
findings are known. One solution to this prob- 
lem is the specification of arousal levels be- 
fore an experiment. A second problem with 
the arousal hypothesis is also a definitional 
one. In a scholarly critique. of activation 
theory, Lacey (1967) has rejected the notion 
of a unitary concept of arousal by pointing 
out dissociations between autonomic, electro- 
cortical, and behavioral measures of arousal. 
While recognizing the importance of these dis- 
sociations, Berlyne (1967, 1969) has sug 
gested that general arousal is, nevertheless, 2 
useful concept. In addition, there is evidence 
of "concordant changes in EEG and m. 
pheral physiological measures | Malmo pi 
Bélanger, 1967, p. 309|” based on relative A 
long time segments (minutes). These author? 
reserve the term activation (arousal) in 
such tonic shifts in level of activity. In a 
ther CNV work, this distinction betwee” 
independent measures of phasic and ton! 
arousal may be useful. —€— 
The problem of conceptual oversimplie 
lion is applicable to attention as well a5 P 
arousal, and Subcategories of attention ee 
been described (Berlyne, 1969, 1970). As P! 


i ; dado & S5" 
Viously discussed, one distinction in the cl2 n 
ical C A 


NV paradigm involves attention t0 * 
and a 


S ask 
ttention to motor aspects of the p a 
Second possible differentiation involve 
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time dichotomy between the interstimulus 
Interval and the brief period following occur- 
rence of Ss. That is, the attention to Sa that 
occurs im phase with CNV between S, and 
S» might bc distinguished from that aspect of 
attention which begins with S» and involves 
information processing. A similar distinction 
Can be made between anticipatory arousal 
(within Sı-Sx interval) and reactive arousal 
(in response to S5) (Karlin, 1970). 

In conclusion, the two hypotheses proposed 
are intended to be an explanation of best fit 
for presently available CNV findings.** Clearly, 
the complexity of experimental data sug- 
Sests that processes other than attention and 
arousal may be involved in CNV develop- 
Ment (Dargent & Dongier, 1969). Conse- 
quently, the present viewpoint is presented as 
tentative and subject to revision as new 
data become available. 


CONCLUSIONS 


1. CNV is a slow, surface-negative elec- 
trical potential of the human brain that can 
be recorded on the scalp independent of extra- 
cerebral sources. i 
2. With scalp recordings, CNV amplitude 
is highest at vertex with antero-posterior and 
ateral gradients of amplitude diminution. 
Vith direct recordings, CNV amplitude is 
highest in frontal areas of the human brain. 
3. The development of CNV is optimum 
With a constant-foreperiod reaction-time para- 
"Itm. A preparatory stimulus (S1) is followed 
"Y an imperative stimulus (S2), to which a 
Motor response (MR) is made, However, 
occurs in the absence of a MR to T 
4. "The development of CNV is most clearly 
"elateq to the psychological processes of at- 


i 3 ac- 
model described here to m 
]so been proposec 
avior and 


"LS two-process 
to t for CNV development has à Lai 
D account for effects of drugs on the behav (Tecce 
A. Ophysiologic responses of scopus d is 
cop. 6, in press). Performance al IO ats func- 
t CeDtualized as both a positive, mone k (and 
On of attentiveness to the experiménibl Sonic (in- 
om from distraction) and & ndn S chizophrenics, 
m d-U) function of arousal level. Low charac- 
te IE. Other psychiatric patients, me d arousal 
“tized ag having defective attention and a 


sms (Malmo, 1959). 


freag 
Verte 


Echanj 


2 


tention and arousal, although other factors 
may also be involved. One possibility is that 
early CNV may be closely related to arousal 
processes while the primary, although not 
exclusive, functional significance of late CNV 
is that of facilitating attention to S». Never- 
theless, CNV is clearly a heterogeneous elec- 
trical brain wave that is not entirely reducible 
to simple, dichotomous theoretical constructs. 

5. CNV is related to other kinds of elec- 
trophysiological activity, notably autonomic 
functions and slow cerebral potentials as- 
sociated with voluntary motor movements. 
In the usual S;-S»-MR paradigm, CNV and 
a motor readiness potential combine to yield 
a hybrid wave or “CNV complex," further 
suggesting the heterogeneity of CNV as 
typically measured. In this situation, CNV 
is probably made up of multiple waves. 

6. A satisfactory understanding of CNV 
requires further elucidation of its relation- 
ship to background EEG and evoked po- 
tentials to Sj, Ss, and stimuli interpolated 
within the S,-S» interval. 

7. Although CNV is a genuine measure of 
slow electrical brain activity, scalp potentials 
generated by eye movements can easily inter- 
fere with its accurate measurement and are 
considered the bête noire of CNV research. 
In studies where eye movements have not 
been recorded, the possible influence of ocular 
potential artifacts shrouds clear and un- 
equivocal interpretation of results. 

8. Another source of obfuscation in the 
interpretation of CNV findings is the variety 
of methodologies used, sometimes unneces- 
sarily, in different experiments. These pro- 
cedural differences make comparative evalua- 
tion and generalization of experimental find- 
ings difficult and suggest the need for stan- 
dardization of selected procedures. 

9. There are suggestions that CNV is po- 
tentially a useful clinical tool, although eye 
movements hover over research on children 
and psychiatric patients as a particularly 
serious methodological problem. 

10. Present notions about the neurophysi- 
ological genesis of CNV involve both cortical 
and subcortical structures, such as fronto. 
cortical areas, thalamo-cortical circuits, and 
the brainstem reticular formation. 
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11. Research areas of importance for the 
better understanding of CNV and brain-be- 
havior relations are the mapping of CNV in 
implanted animals; the study of CNV changes 
as a function of psychoactive drugs, including 
stimulants (e.g, amphetamine) and depres- 
sants (e.g, chlorpromazine); the study of 
verbal stimuli as S; and S», especially in con- 
junction with the lateral (hemispheric) dis- 
tribution of CNV; and the study of operant 
conditions for controlling CNV and behavior, 


Such as performance-related rewards and bio- 
feedback. 
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Block's critique of an article by Bentler, Jackson, and Messick hypothesizing 


Separate processes of acceptance acquiescence and agreement 
discussed. A number of logical contradictions in Block’s arguments 
advocacy of methodological control of response styles, 
denying their existence or importance. In addition, his 


including his strong 
while simultaneously 
technical comments regarding factor 
his conjectures re 
trast to Block’s de 
va 


ment of personality. 


Block (1971) seems to be presenting two 
Major arguments concerning our article 
Bentler, Jackson, & Messick, 1971), one 
questioning the reality and importance of 
agreement and acceptance acquiescence, the 
other acknowledging the existence of “large, 
reliable but variously based sources of vari- 
ance [p, 208]" and recommending procedures 


Or reducing “the obscuring effects of such 
response styles |p. 210].” Thus, although it is 


20t obvious from his abstract, Block is in 
ae agreement with one of the major con- 

Sions of our study: Response styles can 
oscure or drastically modify the observed in- 
“relationships of content traits, and this 


Vari 


4 
ariance ought to be identified and controlled. 
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Component and factor-analytic m s 
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are noted, 


analysis are found to be in error, and 


ding the nonsignificance of factors not supported. In con- 
ation of the value of studies of the nature of obscuring 
ance, it is recommended that acquiescence processes and other response 
styles require investigation, understanding, and control for the optimal measure- 


In view of the difficulties in defining con- 
tent, it might have been constructive for 
Block to have offered his own concrete defini- 
tion. Block completely avoided dealing with 
this major and critical problem, thus providing 
no assistance to researchers attempting to ex- 
plain results unexpected from a content posi- 
tion, such as the case cited in our review (p. 
201) where three presumably Separate “con- 
tent" traits intercorrelated about .86 on the 
average. Instead, he attacked our formulation 
of content as "patent" and “naive,” put 
offered no alternative criteria for distinguish- 
ing response style from content, referring only 
to a quarter-century-old article by Meeh] 
(1945), while decrying the lack of progress, 
More recent work (Jackson, 1971) has sought 
to clarify the nature of the response process to 
scale content and has suggested that great 
progress has been made in the past 25 years 
in the understanding of the substantive com- 
ponent of validity. Indeed, personality psy- 
chology has outgrown its dependence upon 
radical empiricist denial of the need to under- 
stand the basis for responding and the casual 
ad hoc acceptance of items bearing neither 
a theoretical nor structural link to the rest 


of the scale. 


sideration of response-style-trait interrelations flam 
the forced uncorrelated form of Scores corrected by 
partial correlation procedures (Messick, 1962), Sim. 
larly, Block's advocacy of procedural control 
consistent with our well-established position, 


is 
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Although Block strongly advocated control 
of response styles, a major portion of his 
critique was directed at disputing the evidence 
we reviewed. We examine Block's points in 
outline form. . 

1. Block has made some unfortunate mis- 
interpretations and errors in his discussion of 
the Morf and Jackson (1972) study. That 
study sought to represent separate sets of 
content and response-style factors by a facet 
design in which items drawn from specific 
content pools were varied in format and 
related to special marker scales. Items moder- 
ate in content saturation were used explicitly 
to yield an item pool similar to old-fashioned 
but popular instruments, such as the MMPI. 
Block dismisses the use of such items as look- 
ing for acquiescence "under artificially con- 
strained and irrelevant circumstances rather 
than in typical inventory domains where 
acquiescence first was sighted [p. 208]." 
Reminiscent of his earlier (Block, 1965, p. 
118) ad hominem reference to response-style 
researchers as displaying “ignorance,” Block 
now evokes an image “of the drunk who, 
having lost his wallet in a dark alley, pro- 
ceeded to look for it under a convenient street 
light . . . [p. 208]." The item pool for the 
Personality Research Form (PRF) (Jackson, 
1967a), from which the Morf and Jackson 
items were drawn, was substantial in size 
(2,554 items were analyzed empirically). It 
is Important to establish whether the novel, 
explicit statistical procedures for suppressing 
response styles in PRF item selection were 
more ctv thn the use of nonpurposeful 
- era oe undertaken in a series 
Jackson, 1967) and ene A eral 
as ioes. Mor psychometric (Jackson & 
& Jackson 1970) ees "i Wel 
E SOR ) approaches. Neither Blocks 
implication of a bling search, nor his implica. 
tion that Suppressing response-style variance 


in Constructing personalit i 
y tests is a typi 
Procedure, are warranted. Sx 
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publication of their definitions (Murray, 
1938), and in view of the substantial con- 
vergent and discriminant validity that has 
been reported (Jackson, 1967a; Jackson & 
Guthrie, 1968; Kusyszyn, 1968) for FRE 
scales derived from the same item pool.“ 

3. Block is seriously in error in his inter- 
pretation of factor theory and in his sugges- 
tion that the Morf and Jackson factor analy- 
sis is “technically faulty in a fundamental 
way [p. 208].” He states that reliability sets 
the upper bound for communality, and he 
raises questions about Morf and Jackson Ten 
porting communalities higher than reliabilities. 
But, according to elementary factor theory; 
the reliability that sets the upper bound bs 
communalities is defined as “The correlation 
between two administrations of the same test 
or between two tests that are designed to H 
parallel [Thurstone, 1947, p. 83]," and not 2 
the Kuder-Richardson Formula 20 (KR-20) 
as presumed by Block, which may indeed i 
exceeded by communality, particularly in t 
case of factorially complex tests." j 

Block's intimation that the item pools iron 
which the Morf and Jackson items were ex 
tracted represent “random variance" is "° 
factual. The reported reliabilities for the shor 
scales are, in general, consistent with expecit 
tions from application of the Spearman-Brov 3 
formula to the four longer PRF scales fror 
which they were derived, where the media 
KR-20 was .915. qe 

Block proffers an interpretation that sd 
Morf and Jackson factors are due to cha” 

5Block focused on two items that had been mh 


z. Mor, 
keyed in an earlier analysis of the data [M. E.M jn 


i 

An analysis of two response styles: True respond js" 
and item endorsement. Unpublished doctor? 69] 
sertation, University of Western Ontario: 05° 
implying that these were “only one of many tw? 
sible examples.” The correction of these rU to 
other miskeyed items in a new analysis wt b. 
publication of the Morf and Jackson stu jou 
sulted in factors no less clear than those pP"? 
identified, [n an 

“This should not be surprising to Block. aly 
extended replication of Block's (1965) 9 D 
Messick reported that for Block’s EC-4 scale, ? MPs 
tively “balanced” scale of the second largest ciate 
factor, the true. and false-keyed subscales er s 
^A Te 07, and had KR-20 reliabilities Of -e - 
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156. pective commu 
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REPLY TO BLOCK 


Citing the explorations of Horn (1967) 
and of Humphreys, Ilgen, McGrath, and 
Montanelli (1969). But a careful reading of 
the more recent article reveals that one of 
the Most potent determinants of factor reli- 
ability is a suiticient number of tests defining 
each factor. Humphreys et al. recommended 
at least four tests for defining each factor, 
but Morf and Jackson had over 40 defining 
the two acquiescence dimensions. A test of 
the psychometric significance of the factors 
(Jackson & Morf, 1971) involving separate 
and completely independent analyses of two 
Sets of tests, comprising split halves of the 
tests defining each factor, were undertaken 
and rotated independently to a criterion simi- 
i to that of Clustran (Bentler, 1971).* The 
ues, corresponding to split-half reliabilities 
of factors derived from correlating factor 
Scores for seven factors from these separate 
ases, were each significant at well beyond 
€ .0001 level.s 
^s Block pretends that the Bentler (1969) 
: antic differential study was not replicated 
the domain of personality assessment, but 
entirely successful replication was cited 
j, Aer, Jackson, & Messick, p. 195), 50 that 
B Primary criticism of that study was, at 
ast, misleading. Block cannot logically have 


$ le, 
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Were Y three tests, and hence when these ea ae 
Pecteg vided into two sets, they could m. Aer’ 
order to define common factor variance. ecd ie 
yon Intercorre]ations, however, Were signi ica 

« dh ees res derived 
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it both ways in his claim that what is par- 
tialed out in the Bentler study is “unknown” 
while finding it *not surprising" that its re- 
moval will “result in content-meaningful rela- 
tionships [p. 208]." What theory other than 
response style would lead one to expect (a) 
positive correlations among all the variables 
in the two studies as well as (b) high nega- 
tive correlations among polar opposite scales 
after the removal of the effects of the total 
number of adjectives checked? Block has 
provided no explanation for the dramatic 
effects reported. 

5. We submit that findings such as those 
reported by Bentler and Marshall demon- 
strating the superiority of a polar-opposite 
presentation format for adjectives over a 
traditional single-stimulus format are not 
“matters of low priority [p. 209].” If this ef- 
fect might have been predicted, and has been 
well known for years, where are citations 
to similar studies? Where is the evidence 
among published personality tests that this 
finding has permeated the field of assessment? 

6. In reply to Block’s desire to have seen 
a more discursive treatment of acquiescence 
in relation to inventories like the MMPI, it 
is sufficient to state the following: In no 
fewer than 15 separate analyses of diverse 
populations, a factor separating true- and 
false-keyed subscales has appeared. This 


9 prison sample (Jackson & Messick, 1961) ; Penn. 
sylvania State University and Hospital samples. 
(Jackson & Messick, 1962); University of Oregon 
and University of Minnesota samples (four analyses 
of original and reversed scales) (Jackson & Messick, 
1965; Rorer & Goldberg, 1965); analyses of four 
random samples of University of Western Ontario 
original and reversed scales [D. N. Jackson. A 
threshold model for stylistic responding. In C. Hanley 
(Chm.), New interpretations of response style and 
content in personality assessment. Symposium pre- 
sented at the meeting of the American Psychological 
Association, San Francisco, September 1968]; re. 
analvses of separate male and female samnles of 
alcoholic patients of Horn [D. N. Jackson. Discus. 
sion. In G. Dahlstrom (Chm.), Symposium on the 
screening of alcoholics. Presented at the annual 
meeting of the American Psychological Association, 
Miami Beach, September 19701; replication of Block 
analysis on Samples J and D [S. Messick. Psychology 
and methodology of response styles. Paper presented 
at the annual meeting of , Western Psychological 
Association, Honolulu, Hawaii, June 1965], 

Block’s (1965) conjecture that this factor was due 
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finding has occurred with such regularity 
that a mathematical formulation is possible 
(Rogers) 2° : 

7. Finally, Block's criticisms imply a phi- 
losophy of science alien to that of many 
researchers. He uses an odd time scale in 
suggesting that our interpretation of accept- 
ance acquiescence was invoked “post hoc.’ 
It was first proposed on the basis of our 
(Jackson & Messick, 1965) reanalysis of the 
Rorer and Goldberg (1965) data, but formal 
publication was delayed for 6 years while 
fresh data were evaluated. It is good scien- 
tific practice to formulate hypotheses on the 
basis of previous findings and to evaluate 
them with new data. Similarly, Block's reason- 
ing that because acceptance acquiescence was 
hypothesized, there was an acknowledgment 
of the “inadequacy” of previous conceptuali- 
zations that “closes a chapter in the history 
of personality assessment [p. 205]” is un- 
necessarily pejorative, reflecting an insuff- 
cient appreciation for the inevitable successive 
approximation inherent in science. The re- 
finement of acquiescence by hypothesizing two 
processes sought to incorporate all oi the evi- 
dence, something the critics of response-style 
formations have never done. Neither have any 
convincing counterarguments appeared írom 
any impartial source to the serious methodo- 


logical criticisms | made : (Bentler, 1966; 
Jackson, 1967b) of the Block (1965) 
monograph. 


Following his arguments denying the evi- 
dence for two acquiescence processes, Block 
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asserts that such large sources of stylistic 
variance exist, and presents suggestions for 
controlling such response styles. 1f this is not 
a pure and simple contradiction, then one 
must assume that the issue devolves upon 
one of semantic taste. What we have an 
to as distinct species of acquiescence, Block 
prefers to call response styles. Just a k^ 
fragrance of a rose is not affected by 19 
name, neither are the psychometric properties 
of response styles a function of what they 
are called. But understanding is important. 
Block’s suggestion that “it is not necessary 
to study why certain response tendencies 
obscure relationship |p. 210]” implies a^ 
unwillingness to come to grips with the core 
of this controversy, namely, an understanding 
of what is being assessed, how it may be 
identified, and the means by which extranet- 
ous “noise” may be controlled. Without 
understanding, attempts at identification p 
control of response styles will flounder: 
content will remain difficult to verify, aP 
controversy will persist. 
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Research studies dealing with psychopathology in married couples are re- 
viewed. Topics included are incidence of mental disorders among the various 
marital status groups, neurosis and psychosis in marital partners, disturbance 
in marital interaction, and the patient's spouse. The incidence oí mental dis- 
orders is lower in married couples than in any other marital status group. 
When mental disorders do occur among the married, both partners are likely 


to manifest some degree of disturbance. The spouse is affected not only 


partner's disorder but also by 
Marital interaction may contribu 
married couples, but most of th 
and of a post hoc nature. Improv 


the partner's treatment and hospitalization. 
te to the development of psychopathology in 
e findings in this area tend to be nonspecific 
ements in the research design and methodology 


of studies of marital interaction are suggested. 


Psychopathology in married couples has 
been described as a “difficult and uncharted 
terrain [Post & Wardle, 1962, p. 153].” The 
present study represents an attempt to chart 
the "terrain" in the sense of surveying and 
integrating the research that has been done 
and, hopefully, providing some guidelines for 
further exploration. The field of marital health 
has a fairly large body of research associated 
with it, even though it was not until 1967 
that Vincent suggested that marital health be 
recognized as a separate and distinct field. 


Maritar Status AND INCIDENCE oF 
MENTAL DISORDERS 


One of the most common measures of psy- 
chopathology in married couples is the in- 
Cidence of mental disorders among the mar- 


€ incidence for 


incidence of 
are presented in Table 1 
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missions. The actual figures are not presented 
here since there is little uniformity in t E 
various investigators’ reports; that is, -— 
report total number of patients in each E 
tal status category, others report pene 
of distributions, and some report rank we 
only. (The actual figures from some of th 
studies are presented in summary form 
Rose & Stub, 1955.) Y 
As Table 1 indicates, the incidence of m r- 
tal disorders is generally lowest among g 
ried persons, intermediate among the so 
and the single, and highest among pi 
persons. The fact that married DEOR 
less likely to be admitted to mental hospi ia 
than are other marital status groups is mil 
demonstrated in a more recent study (Fe 
Kligler, Zwerling, & Mendelsohn, ^ 
Age, sex, and color differences mus el" 
taken into account in interpreting the Pal 
tionship between marital status and parit 
disorders. Adler (1953) reported that rate? se 
mental disorders for married persons 
lower than those for single and | onec 
Sons even with the age factor taken int? ” 
count. In regard to sex differences, 
and Locke (1963) reported comparable "us 
for both sexes in each of the marital 5% gd 
categories, Malzberg (1964), however; 
igher rates among unmarried males $ 
among unmarried females, though Les 
Posite was true in the married group, t®® fof 


higher Tates for married females tha? gro) 
married males, The trends in Black (N ital 
first admissio 


ns to New York state 5 
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TABLE 1 
MARITAL Status AND INCIDENCE OF MENTAL DISORDERS 
| Rank order of incidence 
Investigator Locality Time period 

M s | w D 
l. Dayton, 1936 1928-1932 | 1 2 3 4 
2. Dayton, 1936 vf 1928-1932 1 2 $ | 4 
3. Landis & Page, 1938 Entire U. S. 1933 1 3 3 | 4 
4. Dayton, 1940 Massachusetts 1917-1933 | 1 2 | 3 4 
5. Malzberg, 1940 1929-1931 | 1 2 3 4 
6. Odegaard, 1953 | 1931-1945 1 3 2 4 
7. Adler, 1953 1930-1948 1 2 3 4 
8. Adler, 1953 California 1945 1 2 3 4 
9, Thomas & Locke, 1963 Ohio 1948-1952 1 3 | 2 4 
10. Thomas & Locke, 1963 New York 1949-1951 1 3 | 2 4 
11. Malzberg, 1964 New York 1948-1951 | — 1 3 2 4 
Sum of ranks H 27 28 H 


Note.—M = married, S = single, W = 


showed lower rates among the married for 
Blacks of both sexes (Malzberg, 1956). 

In investigating the high incidence of men- 
tal disorders among the divorced, Blumenthal 
(1967) found significant differences in mental 

ealth between divorced and nondivorced per- 
Sons, with the divorced reporting more de- 
Pression, more ‘nervous breakdowns,” and 
More drinking problems. Blumenthal con- 
Cluded that the high rate of admissions of 
divorced persons to mental hospitals is a re- 
ection of a real difference in mental health 
€tween divorced and nondivorced persons. j 

Studies of hospitalization rates and marita. 

Status are sometimes criticized because the 
ifferences in rates may be due to the ev 
of mental disorders on the marital status 0 


Individuals before they are a 
Mental hospital. For example, 
Orders tend to lead to Fat 
Oost the rate of mental disorder 
ivorced and, at the same time, decreas? kei 
tate for married persons. The only suay ol 
Which 4 specific attempt Was made iac 
Or this factor is that of Adler (195 gie 
calculated rates by marital status at cd (C 
9f the disorder rather than at rope de n 
5, before the disorder could affect ae 
Status. She found that married aes par- 
ad the lowest rates for mental oe ae 


dmitted to a 
if mental dis- 
e, this would 
s among the 


here have been 


: : d to 
tmine whether marital sta 


= widowed, D = divorced. 


"- 


the type of mental disorder a person may 
acquire. Malzberg (1940) reported lower rates 
for all psychoses among the married with the 
exception of general paresis and alcoholic psy- 
choses. The high incidence of schizophrenia 
among the unmarried has been noted by 
various investigators including Malzberg 
(1940), Odegaard (1946, 1953), and Frumkin 
(1955). The explanation that is usually of- 
fered for this finding is that certain person- 
ality traits and behaviors that are consistent 
with a predisposition toward schizophrenia 
tend to prevent marriage among these indi- 
viduals. It must be pointed out, however, 
that schizophrenia tends to be the predomi- 
nant diagnosis among all of the marital status 
groups with the exception of the widowed 
(Malzberg, 1940). Senile and arteriosclerotic 
disorders predominate among the widowed, 
primarily because of the age composition of 
this group. 

On the whole, there seems to be no clear 
or strong association between marital status 
and the various diagnostic categories of men- 
tal disorders. This lack of relationship be- 
tween types of mental disorders and socio- 
psychological variables such as marital status 
may be more true of the psychoses than of 
the neuroses. Dunham (1968), in a survey 
of epidemiological studies, stated that one 
finding which seems to be gradually emerging 
from research conducted over the past 30 years 
is that psychoses are not readily identified 
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with any specific social environment. He sug- 
gested that research workers turn their at- 
tention to the study of the psychoneuroses 
and the psychopathic disorders since the evi- 
dence for the “social roots" of these dis- 
orders appears to be more substantial than 
for the overtly psychotic. 

Dunham's suggestion implies that constitu- 
tional factors may be of more importance than 
environmental factors in the origin of the 
psychoses. This is one of the approaches 
taken in attempting to explain the differences 
in hospitalization. rates among the various 
marital status groups. Those who stress con- 
stitutional factors in explaining the lower 
rates of mental disorders among married 
Couples postulate that those who marry are 
healthier, both physically and mentally, than 
those who do not marry. Those who emphasize 
environmental factors maintain that marriage 
has a protective and stabilizing influence on 
the partners, which tends to prevent the oc- 
currence of mental disorders. The “environ- 
mental" explanation is, of course, contrary to 
Dunham's conclusion that the psychoses are 
not related to a particular social environment, 

Dayton (1936) found that a combination 
of environmental and constitutional variables 
is necessary in explaining differential rates of 
mental disorders, Malzberg (1940) also took 
both constitutional and environmental factors 
into account in explaining his data. We 
might be reminded at this point that any 
explanation offered for differential rate pat- 
terns is based only on infere 
patterns. Differen 


I pe are reviewe 
tion of this article, 
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ogy in marital partners begin with the vast 
amount of evidence indicating husband-wife 
similarity on a variety of measures. Physical 
traits such as height, cephalic index, eye color, 
hair color, health, and longevity are positively 
correlated in husbands and wives (Neilson, 
1964). Husbands and wives tend to be simi- 
lar in social characteristics such as education, 
race, and religious affiliation (Winch, 1958). 
Evidence indicating that husbands and wives 
also tend to resemble each other in person- 
ality characteristics has appeared in the litera- 
ture for several decades. In 1939, Richardson 
cited moderate but positive correlations be- 
tween husbands and wives in attitudes, in- 
terests, and intelligence. Willoughby (1936) 
noted that husband and wife tend to resemble 
each other in mental health as well. He et 
ported a modest but positive correlation of .2 
for neuroticism in husbands and wives. " 
Evidence for husband-wife similarity with 
regard to personality characteristics has beer 
interpreted either as supporting the assorta- 
tive mating theory or the interactional the 
ory. According to the assortative d 
theory, husband-wife similarity is due to ma š 
selection. The assumption is that individua 5 
tend to choose mates who resemble them g 
various respects, including level of emok o 
stability. Those who favor the interaction 
theory contend that husband-wife similar 
is due to interpersonal factors, that is, mati 
partners tend to become more alike eet 
their interaction with each other. In roles 
Schooley conducted a study of 80 ed 
married from 1 to 20 years, She conet e 
that husbands and wives tend to ber otic 
more alike in terms of values and m ur 
tendencies as length of marriage cw 
This is one of the earlier findings SUZS®® yt 
that interaction may play an important ! in 
in the development of psychopatholo? 
married couples. i 
Numeros studies of neurotic disorder er 
married couples have been reported. riage’ 
and Woodside (1951) compared the ae of 
of 100 neurotic soldiers with the marrias nts 
an equal number of general hospital Pei ip 
hey found significantly more neuro iv 
the wives of neurotic husbands, and th anoo 
reported more “nervous traits” in chi ive 
than did the control husbands and 


in 


| 
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Slater and Woodside attributed this simi- 
| larity in neurotic couples to assortative mat- 
ing. They postulated that an individual with 
neurotic tendencies often chooses a mate who 
has similar tendencies. 

Ryle and Hamilton (1962) studied the 
prevalence of neurosis in 50 London working- 
Class couples. Neurosis in husbands and wives 
Was significantly related, although minimal 
neurosis was found to occur more often in 
Wives than in husbands. Comparable results 
Were obtained in a subsequent study (Pond, 
Ryle, & Hamilton, 1963b). No significant as- 
Sociation was found between neurosis in mari- 
fal partners and social factors such as social 
Class, income, and housing (Pond, Ryle, & 
Hamilton, 1963a). The authors concluded that 
ssortative mating does not provide a com- 
| plete explanation for neurosis in marital part- 
hers, since, in the couples studied, there was 
NO apparent tendency for individuals with 
disturbed backgrounds to select mates from 
.. Similar backgrounds. Several other studies 
.— have indicated that there is a strong associa- 

tion in incidence of neurotic disorders in hus- 

Pands and wives (Hare & Shaw, 1965; Kell- 
(oer, 1963). 

The only study in which an attempt was 
Made to examine a specific type of neurotic 
disorder is that of Woerner and Guze (1968). 

heir investigation was limited to married 


male patients with a diagnosis of hysteria. 


hey f % of the close relatives 
Me had at one time 
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a definite association between the occurrence 
of neurotic symptoms in husbands and wives 
who have been married for many years, little 
association for partners recently married, and 
no association during the premarital period. 
Buck and Ladd interpreted their results as 
lending support to the notion that neurosis in 
married couples is a manifestation of inter- 
personal conflict rather than the result of as- 
sortative mating. 

Instead of correlating psychopathology in 
husbands and wives, Eshleman (1965) ex- 
amined the relationship between the emotional 
stability of each of the partners and the level 
of marital integration. The latter, which 
might be considered a measure of “marital - 
health," was based on husband-wife dis- 
crepancies on the Interpersonal Check List 
and overall satisfaction with the marriage. 
Eshleman found that psychological and psy- 
chosomatic symptoms were inversely related 
to marital integration in the 82 couples 
studied. He attributed no causation to either 
variable but concluded only that the health 
of an individual is related to the health of 
his marriage. 

Unlike the neuroses, the etiology of the 
psychoses is often attributed to hereditary or 
constitutional factors. There is one type of 
psychosis, however, which is usually con- 
sidered theoretical proof that mental dis. 
orders are “contagious.” This is folie a deux 
(madness of two). When two individuals who 
are closely associated share the same delu- 
sional ideas, it is difficult to look at this 
as other than an interactional phenomenon. 
Cases of folie a deux involving husband and 
wife have been reported by Oberndorf (1934) 
and Gralnick (1942). More recently, Prins 
(1950) and Zabarenko and Johnson (1950) 
have described cases of folie a deux in which 
both husband and wife shared similar paranoid 
delusions. Prins noted an alternation of symp- 
tomatology: When one spouse took over the 
other’s delusions, the other lost his symptoms 
but resumed them again when the partner was 
hospitalized. . 

A series of studies of psychotic disorders in 
husband and wife utilize an observed versus 
expected ratio of mental hospital admissions, 
This ratio is based upon the frequency with 
which both members of a married couple are 
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admitted to a hospital within a given period 
of time and the frequency with which only 
one member of a married couple is admitted 
within the same period of time. Penrose 
(1944) was one of the first to conduct this 
type of study. He located 22 cases of husband 
and wife admission to a Canadian mental 
hospital. In eight of the couples, husband 
and wife were admitted within the same year. 
He calculated the number of couples that 
would be admitted by chance within the 
same year and found the observed frequency 
to be nine times greater than the expected 
frequency. In 8 (36%) of the 22 couples, 
husband and wife had the same diagnosis. 
Penrose Suggested the following three hy- 
potheses as possible explanations for his find- 
ings: (a) The interaction of the partners may 
produce abnormal mental reactions in both; 
(5) similarity of environmental stresses such 
as economic status may contribute to a com- 
mon breakdown; and (c) persons who are 
similar, both physically and mentally, tend to 
marry one another (assortative mating). Pen- 
rose considered diagnostic concordance as evi- 
dence for assortative mating and concluded 
that assortative mating seems to exist with 
respect to traits that form the background of 
mental disorders but that more data are 
needed to determine the relative importance 
of the other two hypotheses. 

Gregory (1959), Kreitman (1962), and 


Neilson (1964) conducted studies similar to 
that of Penrose and 


(2) The dis- 


ay represent a reaction 
to the breakdown of the other; 


MARJORIE A. CRAGO 


factor seemed to play a part in only a very 
small percentage of the disorders of the 
couples studied. 

Kreitman (1964) later made an attempt to 
examine more closely the effects of marital 
interaction on psychopathology. A group of 
75 patients (31 males and 44 females) and 
95 controls (32 males and 63 females) com- 
pleted the Maudsley Personality Inventory 
and the Cornell Medical Index. They also 
supplied biographical data. Patients’ spouses 
were found to be more neurotic and had more 
physical symptoms than same-sex control sub- 
jects. In examining the effect of duration of 
marriage on mental health, the following 
trends were noted: In the early years of mar- 
riage, the control couples showed significant 
positive correlations on extraversion, neu- 
roticism, and physical health. Conversely, the 
patients and their spouses had low or nega- 
tive correlations. At the intermediate dura- 
tion of marriage (10-17 years), the correla- 
tions in each group were approximately the 
same. Thereafter, the controls continued to 
show a fall in correlation on the neuroticis™ 
and mental health scores, while the patients 
and their spouses continued to rise. Kreitman 
concluded that no one theory can account 
for all of the findings in both normal and 
patient groups. The theory of marital inter- 
action seemed more in accord with the data 
for patients and their spouses but did not ac 
count for the initially high correlations among 
normal couples. " 

In subsequent studies, Kreitman (19682, 
1968b) compared neurotic couples with m 
chotic couples in an attempt to isolate t z 
effects of marital interaction and assortativi 
mating. Of 74 couples who had once y5 
mental hospital inpatients, 31 (42%) of p 
Couples had the same diagnosis. In 23 de 
the couples, the onset of the disorder att 
curred in both partners before marriage. a 43 
occurred after marriage in both partners © ad 
Couples. When the psychoses were conside 
there was little difference in diagnostic ni z 
cordance between the premarital and P sa 
marital onset groups. For neuroses and oa 
sonality disorders, however, the postman ge 
sroup had a considerably higher concordi e 
rate (50% as compared to 33%). Thus pje 
findings ‘for the psychoses were compat 
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With the assortative mating theory, whereas 
the interaction theory was more consistent 
Vith the data for the neuroses. 

That Psychopathology occurs with greater 
frequency in the spouses of mental patients 
than in the spouses of “normal” individuals 
Seems unquestionable, However, there are a 
number of difficulties involved in ascertaining 

€ meaning of this finding. Very often, at- 
tempts to isolate the effects of assortative 
Mating and marital interaction have been of 
an either/or nature. Yet, as Neilson (1964) 
Pointed Out, there is usually a network of 
factors (such as those outlined by Penrose 
and Gregory) with varying degrees of rele- 

| Vance for the onset of psychopathology. Re- 
Search aimed at ascertaining the relative con- 
‘bution of each of these factors has indi- 
fated that both assortative mating and marital 
teraction are important, while factors such 

àS environmental stress caused by socioeco- 
nomic conditions contribute to a lesser degree. 

Aere is some evidence for the contention 
that interactional conflicts may be of greater 
portance in the development of neuroses 
Nan in the psychoses. k 

lany of the weaknesses in research studies 
We related to the various assumptions that 
pe investigator makes in drawing conclu- 

| Siong from his data. There are several as- 

£'™ptions in the studies reviewed in this sec- 

* that seem open to question. For example, 

55 Often assumed that diagnostic Le AAA 
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Mendell (1956). In studying the communica- 
tion of neurotic patterns over Several gen- 
erations, they found more differences in neu- 
rotic patterns between husband and wife than 
among other family members. 

In many respects, it seems more difficult to 
design a study that will adequately test the 
interactional theory than to design a study 
testing the assortative mating theory. One of 
the main reasons for this difficulty is the 
nature of the variable under study. Marital 
interaction is an ongoing and complex process 
that cannot be easily measured or described. 
In the studies reviewed thus far, no differenti- 
ation has been made between the marital in- 
teraction of one couple and that of other 
couples. Moreover, no attempt has been made 
to specify the type of pathology associated 
with particular patterns of marital interac- 
tion. There have been efforts in this direction, 
however, and these studies are considered 


next. 
DISTURBANCE IN MARITAL INTERACTION 


A number of opinions and observations 
have been offered in an attempt to define 
neurotic marital interaction. Ackerman (1954) 
and Gomberg (1956) conceptualized marital 
interaction as something more than the sum 
of the two personalities involved. Thus, the 
quality of the marital relationship is not 
merely a by-product of the degree of health 
or disturbance in the two partners, Ellis 
(1958), on the other hand, maintained that 
it takes a double neurosis for neurotic inter- 
action to occur; that is, the health of the 
relationship is directly related to the health 
of the individual partners. . 

In explanations of the dynamics underlying 
neurotic interaction, the emphasis is often 
placed on unconscious aspects of the rela. 
tionship between the two partners (Dame, 
Finck, Reiner, & Smith, 1965; Dicks, 1959, 
1964; Ellis, 1964; Huneeus, 1963; Kubie, 
1956; Sarwer-Foner, 1963 ). Van Emde Boas 
(1962), for example, described what he calls 
“zipper” relationships in which martial con- 
flicts satisfy the unconscious needs of both 
partners to such an extent that they cannot 
do without them. Most of these studies are 
either theoretical or are based on observations 
of a small number of married couples with 
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the exception of Dame et al. (1965) who 
studied 37 couples in treatment. 

Attempts have been made to isolate cer- 
tain factors in neurotic interaction that may 
lead to the development of psychiatric dis- 
orders in the spouses. McGee and Kostrubala 
(1964) pointed to the disruption of a neurotic 
equilibrium in the marriage as a primary 
factor in the initiation of treatment by the 
spouses. In the six couples studied, a specific 
event seemed to cause the disruption without 
the couple themselves being aware of the 
significance of the event. 

The patterns of neurotic interaction in mar- 
riage are numerous and varied. In a study of 
14 couples receiving psychotherapy, Martin 
and Bird (1959) found that psychopathology 
in the wives was associated with a marriage 
pattern that they described as the “lovesick” 
wife and the “cold, sick” husband. Other types 
of neurotic marital patterns have been de- 
scribed by Mittelman (1944, 1956) and 
Gehrke and Moxom (1962). Their descrip- 
tions, which are based upon observations of 
couples in treatment, are quite general. They 
refer to such factors as dominance versus pas- 
sivity, detachment versus emotional depen- 
dence, and conflict in masculine-feminine 
roles. 

A number of research workers have turned 
their attention to the description of patterns 
of marital interaction associated with specific 
types of psychopathology. Modlin (1963) 
Studied five married women diagnosed as 
paranoid. He found that the precipitating 
factor in the onset of the disorder was dis- 
ruption in the husband-wife dyad due to the 
husband's withdrawal. The husbands were 
found to be compliant and dominated by the 
wife. In all of the couples, sexual intercourse 
had been greatly reduced or had ceased. Du- 
pont and Grunebaum (1968) and Carter 
(1968) also found that the husbands of 
Paranoid women tend to be rather passive, 
They also noted a reduction of sexual inter- 
nad Ei these marriages. The frequency of 
ule n sexual adjustment among para- 
b nd their spouses has been mentioned 
Y several other Investigators as well (Klein 


Orwitz 1949. Mi : ; 
1954), 49; Miller, 1941; Revitch, 


Jacobson 
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found that the relationship tended to be over 
close and symbiotic. In some cases, depression 
in one of the partners seemed to be pre- 
cipitated by increasing dependency on t 
part of the other. Her conclusions are base 
on clinical observation and require further 
substantiation. . 
Efforts to examine the marital dynamics of 
schizophrenics and their spouses have € 
more extensive than with any other type ^ 
psychiatric disorder. Bowen (1960) spoke ei 
"emotional distance" between the marita 
partners in schizophrenic families, Lidz, Cot 
nelison, Fleck, and Terry (1957) referred to 
marital schism and marital skew in schizo- 
phrenic families. In marital schism, there 5 
open strife that divides the entire family. 
Marital skew results when the serious psy- 
chopathology of one marital partner dominates 
the home. d 
In describing the interaction between schizo 
phrenic partners, Bychowski (1956) or 
that usually one of the partners adapts to the 
other and takes over some of the other 
psychopathology by identification. In a study 
of seven female schizophrenics and their ÞU i 
bands, Becker (1963) reported sex-role con 
fusion in both partners. The wives complaint 
that their husbands took over traditional? 
feminine activities, and the wives themselV? 
seemed to forsake their femininity, that a 
dressed dowdily and frequently stopped me 
struating, 64) 
Sampson, Messinger, and Towne rh 
conducted an intensive study of the Lge 
relations of 17 hospitalized  schizopht® t 
women and their husbands. Marital conf 
precipitated the wife's breakdown in Rer e 
Ways. In six cases, the wife's conflicts jv $ 
related to a crisis of separation. The Wital 
were torn between commitments to the m^ ly 
family and to the parental family, particu | nt 
to the mother. Another type of crisis. eV an 
in four cases, involved the wife's ident ij. 
tion. Marital life revived threatening pe? 
fications with the mother, particularly 1 of 
the wife herself became a mother. I" 
the 17 marriages, the relationship 
the partners was characterized by ! 
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withdrawal and separate worlds of 
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has been examined by Lehrman (1967). He 
Cited a need for abasement, exaggerated nar- 
Cissism, and exhibitionism as contributing 
factors in the development of "mixed mar- 
riage” psychopathology. Further research is 
needed in this area since Lehrman’s conclu- 
Slons are based on a small number of case 
histories, 

Several research. workers have sought to 
apply particular theories to the analysis of 
disturbed marital interaction. Aside from psy- 
Choanaly tic theory, other theoretical orienta- 
Hons that have been applied are communica- 
tion theory and role theory. Communication 

Cory was first utilized in explaining inter- 
Personal interaction in studies of schizophrenic 
families, Since that time it has been extended 
to the analysis of other relationships, includ- 
Mg the husband-wife dyad. Haley (1963) 
Suggested that discrepancies between verbal 
Statements and behavior may culminate in 
Serious marital disturbance. Rabkin (1967) 
Spoke of the marital conflicts that may arise 
When husband and wife come from families 
With different communication codes. y 

Role theory has been applied to the analysis 
of interpersonal interaction by Spiegel (1957). 
.© Proposed that complementarity (reciproc- 
ty) of roles is necessary in maintaining equi- 
librium in any social system. When role 
Complementarity fails, role conflict may be- 
Ome internalized by the participants and 
"oduce neurotic symptoms. Complementarity 


ails when the role partners disappoint each 


ther, 2 Thar 1 Otis (1966) 
expecta . Tharp anc feat 
xpectations. T r also maintained 


and Cr 
ago and Tharp (1968) : ager 
he the stability of the marital pe wa 
y, dependent upon role complerentar y 
€Y contend, as does Spiegel, that 1 ig 
in in role complementarity can lea m 
ttapsychic distress with accompany! 
"Ympt 
in,;*heralizing from a 
fo, action that have been reportec, 
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vidual's symptoms. That marital interaction 
prior to and at the time of the onset of psy- 
chopathology remains the same may be an 
unwarranted assumption. The only way to 
adequately evaluate this assumption is 
through longitudinal studies of marriage rela- 
tionships. Unfortunately, the time and ex- 
pense involved in longitudinal research may 
prevent the appearance of such a study in the 
near future. 

Our knowledge of marital interaction is also 
limited by its nonspecificity. For example, in 
several studies of paranoid women, the hus- 
band’s withdrawal was cited as a precipitating 
factor in the wife’s psychosis. This may be an 
important factor, but we have no way of 
knowing whether it is peculiar to the develop- 
ment of a paranoid disorder rather than an- 
other type of disorder. A definitive answer to 
this question cannot be achieved without a 
direct comparison of the interaction patterns 
in the marriages of individuals with varying 
types of psychiatric disorders. A comparison 
of this sort has not been made, although 
there have been numerous, isolated studies of 
marital interaction among individuals with a 
particular type of psychiatric disorder. 

Several other criticisms of the methodology 
used in studies of marital interaction might be 
mentioned. First, control groups are not often 
used. Second, most of the studies are based 
upon psychotherapeutic observation, case his- 
tory data, or interviews. The weaknesses of 
these approaches in studying interaction in 
intimate groups are discussed at length by 
Rabkin (1965) and Fontana (1966). 

The most direct method of analyzing in- 
teraction consists of systematic recording and 
coding of ongoing interactional processes, 
Family interaction has been evaluated in this 
manner, but there seem to have been no 
studies of marital interaction in symptomatic 
couples in which such an approach was used. 
In spite of certain weaknesses in this ap- 
proach, such as differences between interac- 
tion in a laboratory setting and interaction 
in the home (O'Rourke, 1963), the direct 
observational method is regarded as having 
fewer methodological inadequacies than any 
of the other approaches used in studying in- 
terpersonal interaction in intimate groups 


(Fontana, 1966). 
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Marriage research seems to lag behind 
family research in terms of the methodology 
used in analyzing interaction. However, in 
spite of fewer methodological deficiencies, 
family interaction studies have failed to yield 
conclusive results. After reviewing 40 years of 
family research, Frank (1965) reported that 
no factors have been found in the parent- 
child interaction of schizophrenics, neurotics, 
and those with behavior disorders that can 
distinguish one group from another or any 
of the groups from the control subjects. On 
the whole, neither marital nor family interac- 
tion studies have produced impressive results, 
perhaps in part because of the difficulties en- 
countered in attempting to analyze and mea- 
sure the inherently complex phenomena that 
constitute interpersonal interaction. 


THE ParIENT's SPOUSE 


As was mentioned previously, there is con- 
siderable evidence indicating that mental dis- 
orders occur with greater frequency in the 
spouses of mental patients than in the spouses 
of “normal” individuals. Findings such as 
this have led to a more intensive study of the 
patient's spouse. 

Several studies of the personality charac- 
teristics of spouses of mental patients have 
been reported. Harlan and Young (1958) re- 
ported that the wives of 10 chronic patients 
were characterized by a helpless attitude to- 
ward their husband's condition, and many 
of them exhibited sadomasochistic tendencies, 
Fry (1962) described the Spouses of patients 
with an anxiety syndrome as negativistic 
anxious, and withdrawn. In none of the seven 
tte aa a successfully func- 

g : ^26 Spouse often had a his- 

us pud ums renting the symptoms of 
a patient’s Symptoms ap. 

peared protective; as long as he manifested 
auci a Spouse's symptoms Were suppressed, 
Phy (1963) found the wives of schizo- 
phrenic patients to be less expressive and 
less assertive during interviews than the wives 
of nonschizophrenic patients. l 
The impact of psychotherapy and other 
upon the untreated Spouse 

e € an area of investigation. 
‘herapists Seem to assume that psychotherapy 
no appreciable effect on the un. 
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treated spouse or that the effects can only 
be positive. In contrast to this assumption, 
Hurvitz (1967) maintained that individual 
psychotherapy conducted along classic lines 
may further disturb the relationship between 
the spouses, Even when both spouses are m- 
volved in treatment, there is the danger that 
the therapist may disrupt an unhealthy bal- 
ance when there is little to be gained by 
changing the relationship (Carroll, Cambor, 
Leopold, Miller, & Reis, 1963). 

Becker (1963) and Leichter (1962) have 
noted the subtle undercutting and anxiety re 
sponses displayed by the spouse when he " 
faced with change in the partner as a res" : 
of treatment. The spouse may consciously 
hope for such changes but is often uncon- 
sciously fearful of changes in his partner anc 
the effect they may have on his own role a 
the relationship. In some cases, changes m i 
partner may reveal the spouse’s own inade 
quacies more clearly (Moran, 1954). jlt 

Myers (1959) found that feelings of pun 
and exclusion were prevalent among 20 hur 
bands whose wives were receiving psycho 
therapy, Lichtenberg and Pao (1960) d 
gorized the reactions of 91 husbands to ar 
wives' treatment as follows: constructive es 
tion, obstructive action, subtle rejection, keel 
rejection, constant vacillation, and mainte 
nance of previous pathological relationship 
The largest percentage (46%) of the hus 
bands’ reactions fell into the last category e 

Kohl (1962) followed a similar proced" 
in evaluating the reactions of 21 wives an J 
husbands who displayed signs of patholoE?. 4 
a time when the patients were showing ^. 
provement. Four major types of patholo£ t 
reactions were noted: recurrence of alco 
ism, threats of divorce, resentment towa! te 


F ith acu 

therapist, and depression associated with 4 

anxiety. inte” 
Johnston and Planansky ( 1968) gent 


viewed the wives of 36 schizophrenic pê 
They found that the wives! reaction ous the 
husbands’ disorder could be grouped net 
following four categories: (a) acceP pe? 
(b) blame (of self and others), (c) avo! 
(intellectual and physical) 


lence. Fifty-six percent 
rated in the « 
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9r separation, 
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Kahn (1960) studied spousal reactions to 
psychotherapy within the framework of the 
authoritarian personality. She found that 
Spouses rating high in authoritarianism and 
Conservatism responded more adversely to 
changes in family routines due to the partner’s 
Psychotherapy than spouses who did not rate 
high on these factors. 

Sager, Gundlach, Kremer, Lenz, and Royce 
(1968) evaluated the effects of psychoanalysis 
on the spouse. Data were collected through 
questionnaires filled out by 79 psychoanalysts 
giving information about 432 married females 
and 304 married males in treatment. In gen- 
eral, improvement in the patient was associ- 
ated with improvement in the spouse. It 
Should be noted, however, that 51% of the 
Spouses were also in treatment, often with 
the same therapist as the patient. Of the 
Spouses who were not in treatment, 8% were 
adversely affected by the treatment of the 
Patient. When the spouse became disturbed, 
the marriage was more likely to end in divorce 
°F separation than when the spouse was un- 
disturbed, i 

Dorsey (1961) postulated that the impact 
of the partner’s hospitalization on the spouse 
Might take the form of a change in self- 
Concept and a change in the spouse’s con- 
Ception of the patient. She found that the 
Spouses? concepts of the patients’ person- 
alities tended to remain stable over time, and 

ere were only slight indications of in- 
‘lability in the spouses’ own self-concepts. 

Owever, it is very likely that the I-week in 
Ni between ratings of en, 
"ay have been too short to produce 5 


fa significant nature. 
he processes by w 
n efined as emotionally distur’ 
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Payeq more bizarre behavior. initially, fhe 
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pees’ behavior by interpreting It 1n ak is 
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interpretation of her husband's behavior. The 
wife tended to place less emphasis on her 
husband's “strange ideas" if he could fulfill 
the roles of wage earner, husband, and father. 

Safilios- Rothschild (1968) examined the re- 
lationship between marital satisfaction and 
definition of the spouse’s behavior as abnor- 
mal. The sample consisted of 16 wives and 12 
husbands of hospitalized patients in Greece. 
The satisfied and dissatisfied spouses did not 
differ regarding the definition of the disturbed 
partner’s behavior as deviant. However, there 
were significant differences between the two 
groups with regard to labeling this deviance as 
“mental illness.” Those who were satisfied 
with their marriages tended to diminish the 
seriousness of the partner’s disorder, attribut- 
ing it to a nervous rather than mental condi- 
tion, whereas the dissatisfied spouses were 
more willing to accept a psychiatric diagnosis. 

The effect which the marital relationship 
may have on the posthospital adjustment of 
former mental patients has been the focus 
of numerous investigations. Lipton and Kaden 
(1960) found that marital satisfaction and 
conditions existing at the time of marriage 
were related to the posthospital adjustment 
of 31 male schizophrenics. Of the patients 
whose marriages rated above the median in 
success, 7566 had good posthospital adjust- 
ment as compared to only 29% who had good 
posthospital adjustment and unsatisfactory 


marriages. . 
Posthospital adjustment has also been ex- 


plored by Davis, Freeman, and Simmons 
(1957), who found that former mental pa- 
tients with high levels of role performance 
were more often found in conjugal families 
than in parental families. These results were 
supported in several subsequent studies (Free- 
man & Simmons, 1958a, 1958b). Patients 
with the highest performance levels were 
found in families with moderate to high ex- 
pectations of the patient. (Simmons & Free- 
man, 1959). On the basis of the results of 
these studies, Freeman and Simmons de- 
voleped the “tolerance of deviance" hypothe- 
sis. They hypothesized that rehospitalization 
is dependent upon the patient's level of in- 
strumental role performance in regard to em- 
ployment, household tasks, and social par. 
ticipation. They further hypothesized that 
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parental families are more tolerant of deviant 
behavior than are conjugal families. : 

Freeman and Simmons’ “tolerance of devi- 
ance" hypothesis stimulated further research 
in this area. Some of these studies produced 
results at variance with the hypothesis 
(Brown, Carstairs, & Topping, 1958; Linn, 
1962). Angrist, Dinitz, Lefton, and Pasaman- 
ick (1961) also obtained results in opposition 
to the hypothesis, Patients were rehospitalized 
more frequently from conjugal families; but, 
at the same time, these families showed 
greater tolerance of deviant behavior than the 
relatives of patients who were not rehos- 
pitalized. The most marked differences be- 
tween the two groups were found in psycho- 
logical functioning with the returnees mani- 
festing more Symptoms and symptoms of 
greater severity than those who succeeded in 
remaining in the community. 

In later research, Freeman and Simmons 
(1963) were forced to revise their original 
hypothesis. In a study of 649 discharged pa- 
tients and their families, they found that 
differences between patients who remained in 
the community and patients who were rehos- 
pitalized were more marked in the realm of 
Symptomatic behavior than in the realm of 
instrumental role performance. There was no 
Significant difference in the proportion of pa- 
tients rehospitalized from conjugal families 
and from parental families, although patients 
m conjugal families performed at higher oc- 
Cupational and social levels than patients 


living in parental and sibling homes. 
Freeman 


r even 
hob en parallel phe- 
Severa] recer 
nt studies shed more 
questions regarding m 


u tolerance of 
deviance, and Psychological fur 


nctioning de. 
è A : 
Posthospital adjustment. Straight 


MARJORIE A. CRAGO 


(1965) found a positive relationship between 
the wife’s acceptance of her husband's Beg 
havior (tolerance of deviance) and his ue 
hospital adjustment. Lefton, Dinitz, Aer 
and Pasamanick (1966) attempted to esta? 
lish normative standards of role expectations 
role performance, and psychological ae 
ing by comparing 62 married females who te 
played functional disorders with 60 perse 
women who were neighbors of the patients. 
As it turned out, the patient wives and 
trol wives differed significantly on only one 9 
the measures—namely, psychological function- 
ing. s 
‘Brodsky (1968) reported that the m 
status of wives is an important factor in ue 
pitalization. Nonworking wives were fount vd 
have fewer hospitalizations than wives " 
worked outside the home. Brodsky compare” 
the home to a sheltered workshop where ud 
chological stresses are likely to be fewer th 
in other types of work situations. ae 
Miller and Barnhouse (1967) have Ad 
scribed some of the characteristics of par 
mental patients who return to the gr 
Of the 80 cases studied, 6796 were wives sie 
33% were husbands. Wives tended to p 
more rehospitalizations than husbands e 
had spent nearly twice as long in state a 
pitals as husbands. Broken marriages x 5 
common in this group; 31% of the puso if 
and 28% of the wives had one or more Pe 
ous marriages. -onsbiP 
Griffin (1967) examined the relatio rost 
between dominance in marriage and pe Pot 
hospital adjustment of male patients. TY wert 
male patients who were rehospitalize ceded 
compared with 20 male patients who gei ast 
in remaining in the community for ae the 
2 years after discharge. She found that ce 
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terms of various categories as Lichtenberg 
and Pao (1960) and Kohl (1962) have done. 
But the next step to be taken is a more dif- 
ficult one. We are not yet able to specify how 
changes in one person produce positive or 
adverse effects in another person. This neces- 
Sitates an evaluation of interactional processes, 
the inherent complexities of which have al- 
Teady been mentioned. At any rate, an answer 
to this question would seem to require gather- 
mg data regarding the nature of interaction 
Over long periods of time, both during treat- 
Ment and nontreatment. Fox (1968) sug- 
Sested that more longitudinal studies are 
needed because the effects of treatment upon 
© spouse may be related to the time chosen 
or investigation. He referred to the findings 
of Glasser (1963) regarding changes in family 
Equilibrium during the treatment of a family 
member, Glasser found that the effect on the 
amily was different during different stages of 
treatment, 
There have been relatively few studies of 
€ impact of the patient’s hospitalization on 
1€ spouse, The results of the studies that 
ave been done are quite general. It would 
rem that any evaluation of the relationship 
“teen the spouse’s attitude toward the pa- 
a and the patient’s posthospital ri rites 
xta hinge on an exploration of the impac 
the patient's hospitalization on the spouse. 
tie Spite of the reciprocal nature of the pa- 
t-spouse relationship, there has been E 
Cater emphasis on the spouse's effect on the 
‘tent than the patient’s effect on the — 
Most of the studies of posthospital adjs 
Of married patients have been pis 
Ing various aspects of Freeman an k 
Ong? s pothesis. 
nS “tolerance of deviance hyp! jose 
€ conclusions that can be drawn from thes 
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SUMMARY AND CONCLUSIONS 


There are certain generalizations that can 
be abstracted from the research studies re- 
viewed here. A substantial amount of evi- 
dence indicates that married couples are less 
likely to be admitted to mental hospitals than 
are other marital status groups. When mental 
disorders do occur among the married, both 
partners are likely to manifest some degree 
of disturbance. The spouse is affected not 
only by the partner’s disorder but also by the 
partner’s treatment and hospitalization. The 
tendency for psychopathology to occur in 
both partners is usually explained in terms of 
assortative mating and/or marital interaction. 
A number of patterns of disturbed marital in- 
teraction have been described. However, at 
this point, no clear or definite association has 
been delineated between particular patterns 
of marital interaction and the development of 
particular types of psychopathology. 

There are other findings in current mar- 
riage research that seem to follow a general 
pattern but require further exploration. For 
example, rates of neurosis as well as hos- 
pitalization and rehospitalization rates tend 
to be higher in wives than in husbands. Fur- 
ther substantiation of these results is needed 
along with an examination of the possible 
theoretical basis for such findings. One hy- 
pothesis that this reviewer considers both in- 
teresting and relevant is that the fulfillment of 
marriage roles may take a greater toll in terms 
of the wife’s mental health than is the case 
with husbands. The wife’s role in marriage 
has been described as expressive, integrative, 
and accommodating in contrast to the hus- 
band’s more rigid, instrumental role (Tharp, 
1963). One might conclude from these differ- 
s in husband and wife roles that the wife 
must make the greater adjustment in mar- 
riage. This is exactly what Luckey (1960) 
has concluded from her studies of perception 
of self and spouse 1n married couples. It may 
be that the necessity for such accommodation 
and adjustment in the performance of mar- 
riage roles leads to a certain amount of strain 
in the wife which may, In some Cases, cul- 
minate in an emotional disorder. This hy. 
pothesis might best be tested by conducting 

dies of marital interaction in 
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tionship that may be involved in the develop- 
ment of psychopathology in the wife. 

Other suggestions that this reviewer might 
make with regard to future marriage research 
also necessitate an examination of marital 
interaction. For example, Dunham's (1968) 
contention that social factors, such as inter- 
action, play a greater part in the development 
of neurotic disorders than in the psychoses 
might be investigated further. In this re- 
viewer's opinion, marital interaction is, meth- 
odologically, the most difficult area of mar- 
riage research. But it is also one of the most 
intriguing in terms of the generation of theory 
and research hypotheses. 
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The major objective of the present study was to find in the factor-analytic 
investigations of L. L. Thurstone some information concerning factorial in- 


tellectual abilities that have unique places in the structure-of-intellect (SI) 
model, and, more incidentally, 
in Thurstone’s published primary ment 
Although a number of his factors can 
abilities, in only one or two in 
sentative of a single SI ability, 
eral outcome was largely due to the f: 
too many different SI at 
were not adequate. In successive analyse 
verge in the direction of SI abilities, so t 
one SI ability each, with the exception 


ac 


The termination of a program of intensive 
Tesearch on intellectual abilities or functions 
In the Aptitudes Research Project (ARP) at 
the University of Southern California seems 
to be an occasion for a reexamination of 

: L. Thurstone's “primary mental abilities," 
Which have been well known in the literature 
for the past 30 years. There is much in com- 
Mon between his abilities and those issuing 
Tom the ARP program, but there are also 
Numerous points of difference that it would 
Seem useful to point out. For those who em- 
Ploy the Thurstone primary mental abilities 
(PMA) tests, either in research or in practical 
Contexts, the newer information arising from 

€eper probing into the nature of human 1n- 
telligence has much to offer by way of psy- 


chological interpretations. 
Although a thorough re nation oi 
investigations of intellectual abilities might be 
M order, the historical importance of the 
Thurstone analyses gives them first call for 
Consideration, In other words, this article is 
not intended as a comprehensive review of 
actor analyses of intellectual abilities. How- 
Ver, it mentions some of the more relevant 
audies that have yielded significant informa- 
on that helps in the comparison of e 
urstone and ARP factors oF that serve 
à Connecting links between them historically. 
few other possible conn ng links have 
be sent to J. P. Guil- 
en s California 90213. 
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unconfounded with other abilities. 


bilities for which the number 


to determine which SI abilities are represented 


al abilities (PMA) battery of tests. 
be cited as probable forecasts of SI 


stances was a Thurstone factor clearly repre- 


This gen- 
t that his analyzed batteries involved 
and varieties of tests 
s, some of his factors tended to con- 
hat his published PMA tests represent 
of the Reasoning and Number tests. 


been neglected in order to keep the study 
within reasonable length. 


COMPARISONS OF THE Two APPROACHES TO 
FACTOR ANALYSIS 


Preparatory to the comparisons that fol- 
low, it is important to have in mind certain 
features of the methodology by which the 
PMA and ARP abilities were investigated. 
There were basic similarities but also some sig- 
nificant differences. Let us note the similarities 


first. 


Some Similarities 

In both research programs, as much as 
possible, factor analysis was used as a hy- 
pothesis-testing device. In planning a research 
study, considerable effort was devoted to a 
rational consideration of a domain of abilities, 
with hypotheses proposed concerning abilities 
to be expected in an area of functioning. The 
supposed natures of those abilities suggested 
the kinds of tasks needed in order to test 
those hypotheses. Being one of the pioneers 
in such research, Thurstone had few prior 
findings to guide him toward hypothetical 
factor abilities. On the other hand, the ARP 
had the findings of Thurstone and others 
particularly those of the Army Air Forces 
(AAF) research program (Guilford & Lacey 
1947). Furthermore, after some early analysés 
the present writer developed the structure-of- 
intellect model, which became the ARP’s 
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fruitful source of hypotheses during most of 
its tenure. 

Basically, the two methods of analysis were 
the same. The foundation for both was 
Thurstone's multiple-factor theory, which 
calls for extracting from a matrix of inter- 
correlations among tests a number of or- 
thogonal factors, and a rotation of axes in 
order to achieve meaningful, psychological 
variables. Thurstone used his own centroid 
method of extracting factors. The ARP 
Started its program with that method but 
soon switched to the principal-axes technique, 
which computers made possible. The differ- 
ence is not an important one. Differences in 
rotational procedures were important; they 
are mentioned shortly. 

Another important condition that was com- 
mon to the two programs was the use of rela- 
tively homogeneous groups of experimental 
examinees or subjects. The homogeneity was 
in terms of age and educational level, and 
sometimes, also, sex membership. The im- 
portance of homogeneity as an experimental 
control cannot be overemphasized. As Kelley 
(1928) pointed out, it is likely that many a 
Spearman g that had been found in early 
factor analyses was due to lack of homoge- 
neity of the tested population. Many tests 
correlate with age, education, or sex, 
makes them correlate with one 
whereas with homogeneous 
correlate zero or near zero, 
chological g 
tiple-factor 
it out. Tt j 
versal scop 
ever, where 


which 
another, 
groups they might 


ed as zero (Guilford, 1964). 


Sar 1 analyses 
eed oblique rotations of axes, whereas all 
3 Totations were t 
dua Enel orthogona], The 
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combinations of analyzed test variable, due 
to incidental selections (Guilford & Zimmer- 
man, 1963). Degrees of obliqueness are prone 
to lack invariance as batteries of tests change 
and as each factor is represented by a different 
set of tests. Obliqueness is often to be at- 
tributed to failures to construct tests that 
provide good experimental controls. W es 
optimal controls, perhaps all factors shoul 
approach mutual independence, Until we have 
exerted all reasonable effort to control prê- 
cisely what examinees have to do in order 
to make good scores in tests, we cannot E 
ject the hypothesis of orthogonality. At amr 
rate, interpretations of the natures of hes 
torial abilities should not differ materia! 
whether rotations are orthogonal or oblique 
Experience seems to bear out this proposition 
A more critical difference in rotation? 
methods pertains to the basis for location 
axes in final solutions. Thurstone S 
graphic methods of rotation, aiming at t” 
two objective criteria of positive manifold ue ; 
simple structure. The former should appi 
when dealing with aptitudes, which are h 
polar concepts. The latter carries the hidder 
assumption that the tests that we create te! j 
to cluster along the lines of fundamental € 
chological variables of individual difference” 
Especially when tests are aimed at what i 
thought to be relatively independent abiliti 3: 
there is evidently enough truth in this assum a 
tion to make the principle roughly applic@ in 
Tt would certainly not be a safe assumption r 
all cases. Experience shows that it is “na 
to construct factorially complex tests ce 
factorially univocal tests, Accumulating 
perience has led the present writer tO 
trust the simple-structure principle. 
voices are being heard in the same 
(Butler, 1969), sag COT 
In the ARP program, the overriding "o. 
cern was to achieve invariance of PS ge 
logical factors (see Guilford & Host oal 
1971). No factor proposed as a psycho unl 
Concept has much claim for attention po 
it can be replicated in places where it zi in 
be found. The graphic rotations perfor sy 
the early studies by the ARP observe Ji 
chological meaning as a third importa 4 os 
terion for rotations. When Cliff’s me! es a 
rotation to congruency of factor matri can 
Peared on the scene (Cliff, 1966), Ít 


dis 
ther 
eit! 
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the adopted procedure. All the earlier analyses 
were then redone, using Cliffs method. In 
applying the method, starting with the prin- 
cipal-axes matrix, the investigators set up one 
hypothetical factor matrix after another as 
targets toward which to aim rotations until 
a good fit and good replications were achieved. 

It is recognized that such constraints pro- 
duce better fits to theory than are probably 
justified. But the procedure faces the fact 
of life that the large degree of indeterminancy 
incident to factor-analytic processes does not 
justify blind faith in results obtained by any 
known rigorous rotational operations. One 
Outcome has been of some interest. In general, 
When good fits to theory and good invariance 
are reached, simple structure of a kind is also 
achieved. The truth is that several different 
Solutions, all with what looks like simple struc- 
ture, are possible. There have been attempts 
to achieve mathematical definitions of simple 
Structure, but rotations so as to reach any one 
of them, such as in a varimax solution, may 
lead the investigator seriously astray psycho- 
logically (Guilford & Hoepfner, 1969). Even 
in terms of mathematical definitions, the 
Simple-structure criterion is not to be fully 
trusted, 

Apart from the rotation problem, there was 
an incidental feature of the Thurstone analy- 
Ses that endangered good solutions in some 
Places, This feature was his frequent use of 
alternate forms of the same test in the same 
Analysis, for example, three reading tests, 
three reasoning tests, or four numerical-opera- 
tion tests, The outcome is harmless when such 
tests generate by themselves a factor that can 
e easily spotted as being actually a specific. 

"It the trouble is that they often carry gae 
€sts along with them, which probably ane 
a confounding of the specific with rere 
else, and the other tests may be robbed o 
What should be variances from other ceo 
Actors, The importance of this situation or 
Number factors is pointed out later. 


ne historical handicap that Thurstone suf, 
‘red was that at his time no one pei te 
* very large number of intellectual = e: 

nich as have been found by the AR yh 

Pecting only a few primary mental abi 

? The number of intellectual abilities now de 

ed by factor analysis is approximate y z 

"is Dredicted by the SI model. 
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Fic. 1. The structure-of-intellect model. 


and that each ability would have extensive 
generality, Thurstone assembled batteries for 
analysis with entirely too much variety among 
the tests. The result was that many an SI 
ability was represented by only one or two 
tests. In ARP experience, in an exploratory 
study it takes four or five tests aimed at a 
hypothesized ability in order to ensure ade- 
quate representation. In his first PMA analy- 
sis, Thurstone probably had 28 SI abilities 
represented, as suggested by current knowl- 
edge of components of such tests. Eleven were 
represented by one test each, 8 by two tests 
each, and 9 by three or more tests. Thurstone 
extracted and rotated 12 factors, of which he 
interpreted 9, and felt reasonably sure of his 
interpretations of less than that number. The 
fact that other psychological factors were 
latent in his data was demonstrated later by 
Fruchter (1948) and Zimmerman (1953) in 
reanalyses of those data. 


STRUCTURE OF INTELLECT 


Since the ubiquitous frame of reference for 
what follows is the structure-of-intellect 
model, for the benefit of readers less ac- 
quainted with it, this very brief exposition on 
it is provided. 

Each SI ability is represented by a par- 
ticular cell in a three-dimensional matrix (see 
Figure 1) and is defined by its unique ent 
junction of a kind of operation with a kind of 
content and a kind of product. Each ability 
is designated by a trigram. For example, CMU 
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stands for cognition of semantic units, which 
means awareness or understanding of ele- 
mentary constructs, such as objects thought 
about. CMU is most clearly assessed in in- 
dividuals by means of a vocabulary test. An- 
other example is MFI, which stands for mem- 
ory for figural implications, and is best as- 
sessed by paired-associates memorizing of 
pairs of figures. The individual learns con- 
nections between members of the pairs so that 
one member comes to imply the other. EST 
stands for evaluation of symbolic transforma- 
tions, which means rendering judgments con- 
cerning changes in information that is com- 
monly in the form of letters and numbers. 
This ability is known to apply in judging 
whether one algebraic expression is correctly 
transformed into another, as by factoring, 
transposing, or substituting events. 

It may have been noted that the first letter 
of a trigram stands for the kind of operation, 
the second for the kind of content, and the 
third for the kind of product. In each case, 
the letter is the initial of the category name 
(D = divergent production, B = behavioral, 
R = relation, and so on), except that N 
stands for convergent production (C was pre- 
empted by cognition), and M stands for se- 
mantic (S was preempted by symbolic). 

Meanings of the 15 categories of the model 
(5 operations, 4 contents, and 6 products) 
should become clear from the discussions, 
Which include descriptions of their empirical 
referents, the psychological tests. Space is not 
taken to present formal definitions here. A 
detailed treatment may be found in the 
writer's book (Guilford, 1967). 


THuRSTONE Factors 

We consider 
known PMA fa 
the interval fro 
what he called 


each of Thurstone’s better- 
ctors that he reported during 
m 1938 to 1951. They include 


s verbal, space, number, percep- 
ton, memory, induction, deduction, word 


fluency, and his two closure factors C1 and 
C2. We look at his verbal factors first. 


Verbal Factors 


The verbal factors found in most of Thur- 
Stone's analyses would now be regarded as 
a composites, each a confounding that 
= ean a number of semantic abilities. 

ample, in his first PMA analysis (Thur- 
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stone, 1938a), the factor that he character- 
ized as “verbal relations” was with loadings 
of .38 or above for a variety of tests of types 
whose dominant SI factors have been for 
abilities other than CMU (cognition of se- 
mantic units), in SI terminology. Three tests 
should have represented CMC (cognition of 
semantic classes). They were Reading I, 
Reading II, and Word Grouping. Two tests 
are now known to represent DMR (divergent 
production of semantic relations) Controlled 
Association and Inventive Opposites. The lat- 
ter also has some variance from NMR (con- 
vergent production of semantic relations) be- 
cause it restricts each item to only two re- 
sponses, each with a given initial letter. The 
divergent aspect probably comes from the 
fact that alternative responses are to be 
given. It was the presence of these two tests 
that enabled Fruchter (1948) to find a factor 
of associational fluency (DMR) in his re- 
analysis of Thurstone’s data. It also enabled 
Zimmerman (1953) to report a similar factor 
from the same data. 

Representing other ST abilities were Verbal 
Analogies (cognition of semantic relations, 
CMR) and False Premises (evaluation of S€ 
mantic implications, EMI; and evaluation 9 
semantic relations, EMR). Only the two Vv 
cabulary tests—Vocabulary (Chicago) an 
Vocabulary (Thorndike)—are of the kin 
that commonly represents the best-know? 
verbal ability CMU, but in Thurstone’s firs 
analysis they had even stronger loadings pe 
other factors. In later analyses, however; d 
found his verbal factors narrowed more A 
more to vocabulary and reading-comprehet” 
sion tests, which others have found to rep" 
sent faithfully SI ability CMU. 


Space Factors 


Thurstone (1938a) first described his r 
factor as a “facility in spatial and vist t 
imagery [p. 80]." This definition seems t? 
the factor called “visualization” in the Ily 
research (Guilford & Lacey, 1947) and fin s 
given the SI designation of CFT (cognitio" 
figural transformations), The AAF prog 


sz; HER 
3 It is possible to make statements regarding ts 
probable ST abilities represented by Thurstone $ nat? 
because some of the same tests or their aN" gre 
forms have been analyzed by the ARP. others 
similar to other ARP-analyzed tests. 
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found repeatedly a second space factor later, 
called “spatial orientation,” which was de- 
fined as an ability to comprehend arrange- 
ments of objects in visual space. Of the 13 
tests loading .39 and above in Thurstone’s 
first analysis, 5 would seem to represent CFT. 
Surface Development, Block Counting, and 
Form Board have been found to do this in 
later analyses by the ARP and others 
(Michael, Zimmerman, & Guilford, 1950). 
Lozenges A and Lozenges B appear to fit the 
current conception of CFT, which is the 
ability to imagine changes in visual objects, 
such as movements, rotations, reversals, or 
rearrangements. Two tests in Thurstone’s list 
seem to represent SI ability CFS (cognition 
of figural systems), which was defined above 
with the name of spatial orientation—his 
Flags and Cubes. Both of these tests came 
Out on a factor identifiable as CFS in the 
analysis by Michael et al. (1950). 

Not until much later did Thurstone come 
to recognize more than one space factor, as 
Seen in his 1950 report, in an analysis de- 
Voted primarily to visual-space tests.’ In that 
Study he found what he regarded as four 
Space factors. Factor Sı he defined as the 
"ability to visualize a rigid configuration when 
it is moved into different positions.” The tests 
Were Figures and Cards, in both of which 
the examinee (the subject) is to say whether 
two simple figures having the same shape and 
Containing holes could be showing the same 
side or whether one would have to be turned 
Over. They are essentially two forms of the 
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hension in the AAF research), and Paper 
Puzzles. The latter is like the Minnesota 
Form Board test, which has a legitimate claim 
to represent CFT. 

The appearance of two different factors 
both based strongly on tests representing CFT 
requires explanation. Thurstone's distinction 
between the two was in terms of movement of 
the entire object versus movement of parts 
within the object. Because at least three of 
the tests for S, were sufficiently similar to be 
actually alternate forms of the same test, there 
is much suspicion that S; is primarily a 
specific component. Such tests should be ex- 
pected to have significant loadings on Ss as 
well as on S;, but in the oblique solution they 
have loadings of essentially zero on S». In an 
orthogonal solution, the loadings of these 
tests on S» might be significant. The correla- 
tion reported between S; and S» was only .38, 
however, which throws some doubt on this 
suggestion. There may be some actual differ- 
ence between abilities to imagine transforma- 
tion of wholes versus of parts, but this find- 
ing of Thurstone's stands unique and needs 
verification. 

Thurstone’s factor S3 in his 1950 analysis 
was represented by only two tests, but they 
can qualify as tests of the SI ability CFS or 
the AAF spatial orientation. Thurstone did 
not interpret this factor except to put it in 
the space category. Its leading test Cubes 
has a history of loading on factors that rather 
clearly represent ability CFS (Michael et al., 
1950, 1951). Lozenges A could also well have 
some relation to CFS. But both tests may 
also have some relations to CFT, 

The AAF research found a factor called S» 
(not to be confused with Thurstone’s S2), 
which it interpreted as kinesthetic in nature. 
It was common to Thurstone’s test called 
Hands and his test Flags. In Hands, a large 
number of sketches of the human hand are 
shown, each with the hand in a different 
position. The subject is to say whether the 
picture represents a right or a left hand. In 
Flags, the subjects is to say whether the 
same side of the United States flag is showing 
in two paired views, with the flag rotated at 
different angles. Examiners administering these 
tests have observed that subjects often make 
at least minimal hand movements in solving 


the items. 
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In his 1950 analysis, Thurstone found a 
factor K, with Hands leading on it and with 
Bolts as its helpmate. In each item of the 
latter, a bolt with a right-hand screw is par- 
tially screwed into a block of wood that is 
tilted in a certain position. The subject is to 
say in which of the two directions the bolt 
must be turned in order to drive it further into 
the block. Thurstone also reported the ob- 
served involvement of the subjects’ hands, 
which led him to conclude that the factor 
"represents kinesthetic imagery." 

It can be added that Thurstone also ana- 
lyzed the same battery of space tests along 
with additional tests thought to involve me- 
chanical knowledge. So far as the space 
factors were concerned, the results were es- 


sentially the same as those in the 1950 analy- 
sis. 


Factors Named “Perceptual” 


In his first 1938 analysis, Thurstone 
(1938a) said of a factor that he called sim- 
ply “perceptual” that it was a “facility in 
perceiving detail that is embedded in ir- 
relevant materials [p. 81]." His interpreta- 
tion apparently rested very heavily upon the 
one leading test, Identical Forms, which in 
recent history has always helped to mark 
factors for EFU (evaluation of figural units) 
and which, following AAF terminology, has 
most often marked a factor called “perceptual 
speed.” Identical Forms calls for judgments 
of whether each of five similar figures is or is 
not identical with a key figure, hence it should 
logically represent EFU, But on the basis of 
present knowledge, the other eight tests on 
the factor in Thurstone’s first analysis repre- 
Sents anything but EFU. Three tests represent 
the verbal factor CMU—Vocabulary, Comple- 
tion (a vocabulary test in completion form), 
and Disarranged Sentences (given scrambled 
words belonging to a sentence, the subject is 
to read and comprehend the sentence). Two 
tests, verbal Classification and Word Group- 
ing, have been found by the ARP to represent 
alike two other abilities; CMC and NMC 
Convergent production of semantic classes). 

other test appears to be primarily for 
» COgnition of figural relations (Pattern 
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Analogies, a figure-analogies test), another to 
be primarily for CMR (Verbal Analogies), 
and still another (Picture Recall) for MFU 
(memory for figural units) or MMU (memory 
for semantic units) or both, depending upon 
whether the subject remembers appearances 
of the pictures or what they represent in the 
way of real objects. 

The AAF psychologists had a decidedly 
better basis for calling “perceptual speed" a 
factor that coupled Thurstone’s Identical 
Forms with their own tests, Spatial Orienta- 
tion T, Spatial Orientation II, and Speed of 
Identification, The first of these three in- 
volved comparing and judging pairs of aerial 
photographs for identity, the second involved 
matching aerial photographs with places on 
a map of the terrain, and the third involved 
matching identical airplanes. Although, as AR 
Thurstone's earlier analyses, some non-EFU 
tests tended to go with the list on the AAF 
factor, as other tests representing other abil- 
ities in common with those “foreign” tests 
were included in analyzed batteries, the re- 
striction of the perceptual-speed factor to 
EFU-type tests became more definite. Inci- 
dentally, this kind of outcome is an example 
of how tests without support from other tests 
of their stronger common factors in an analys!* 
go hither and yon, depending in part upon 
chance-inflated correlation coefficients. Such 
tests of underrepresented factors often con- 
fuse the picture in an analysis, as they did in 
many of Thurstone's early studies. 1 

The AAF name for its factor, perceptu 
speed, was in recognition of the fact that in 
tests comprised easy items that almost x^ 
one could do perfectly if he has the time. : 
the ARP research, such a factor was T 
quently found, marked usually by Thurstone 
Identical Forms and Part IV of the Guilfore 
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tests were designed, one almost a power test 
(Judgment of Size) and one a partially 
speeded test with items varying in difficulty 
(Judging Figural Combinations). In Judg- 
ment of Size, a central geometric form is to be 
Compared with four surrounding figures of 
the same shape, with the subject to say which 
One is identical in size. In Judging Figural 
Combinations, the subject is to say which of 
five outline squares contains the same small 
Seometric figures as a standard set, where 
there are variations of shapes, sizes, and num- 
bers of small figures. These tests came out 
With Thurstone’s Identical Forms on a factor, 
thus linking the factor with history. Thurstone 
Was wrong in concluding that a feature of 
the ability. is “perceiving detail that is em- 
bedded in irrelevant material [Thurstone, 
1938a, p. $1|," and the AAF psychologists 
Were wrong in attributing a speed feature to 
the ability, j 
Historically, there has been another issue 
in progenitors for EFU. In a special analysis 
of his perceptual ability, Thurstone (1938b) 
found a factor identified as "perceptual" but 
aving no tests loaded significantly on it that 
Would now be placed in the figural-content 
Category, "There were three tests that should 
"epresent the parallel ability ESU (evaluation 
9f symbolic units) —Identical Numbers, Iden- 
tical Names, and Scattered Ns. All three are 
Obviously in the symbolic category; matching 
things for identity is an evaluative task; and 
Ne products are units. Interestingly enoti, 
Wo other tests on Thurstone’s factor are i 
the kinq that are now known to measure the 
ability EMU (evaluation of semantic units), 
‘Nother parallel variable. Concrete equo 
tion asks the subject to mark words in a ^ 
at are clearly associated with a key ke 
Verbal Enumeration asks the subject to check 
i d class. 
Words in a list that belong to a state - 
,Stract Classification was of a similar patute 
“Nother EMU test. Thus, what Taser 
“entified as his perceptual factor 1a ie ee 
is analysis was apparently a per uo 
ESU and EMU. The operation and Di 
"€ the same as for EFU, but the 


Ste different. 

yp Milar confoundings occurred © ds 
"rstone analyses, once pes ke 
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Thurstone & Thurstone, 1941), The 1941 
analysis brought four CSU (cognition of sym- 
bolic units) tests into the picture with two 
EFU tests and one EMU test on the same 
factor. The four CSU tests were Mirror Read- 
ing, Word Puzzles, Incomplete Words, and 
Identical Numbers. Mirror Reading required 
the subject to recognize a word as it would 
be seen in a mirror. Word Puzzles was an 
anagrams test, the words being presented with 
scrambled order of letters. The two EFU tests 
were Identical Pictures, and Faces, both re- 
quiring matching identical objects. Verbal 
enumeration was described as an EMU test 
earlier. 

Bechtoldt (1947), a student of Thurstone, 
seems to have succeeded in separating the 
three SI evaluation-of-units factors in his 
analysis. His factor C, which he interpreted 
as "seed of recognition of predetermined 
symbols in contexts of discrete distractors 
| French, 1951, p. 65],” by its conjunction of 
tests justifies identification with ESU. Note 
his reference to symbols in his statement, In 
three tests, the subject was to cross out speci- 
fied letters of numbers within mixed sets of 
other letters and numbers. 

His factor A was described as “fluency of 
associational recognition with perceptual ma- 
terials [French, 1951, p. 65].” The qualifica- 
tion “perceptual” is misleading, for the con- 
tent of the tests is definitely semantic, not 
figural. The strongest test for the factor was 
Word Checking, which asked the subject to 
check words that belong to a specified class, 
for instance, things growing and smaller than 
a football. Other tests were Unfinished House 
(“check listed words associated with an un- 
finished house”) and Verbal Enumeration, 
which was previously described. Still other 
pertinent tests were Opposites (“in a long list, 
mark every pair of words that includes op- 
posites”). Size Comparison (“in pairs of 
named objects, mark the larger of the two”), 
and Boys’ Names (“in a list of words, check 
names of boys”). All these tests logically 
qualify for ability EMU. . 

Bechtoldt's factor Y was interpreted by him 
as “facility in organizing simultaneous visual 
configurations under distraction of continued 
act [French, 1951, p. 65]. Only the “visual 
configuration” component of this description 
really fits the picture of the factor. Thurstone's 
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Identical Forms helped to identify the factor, 
but with a barely significant loading. Stronger 
tests included Shape Constancy (involving 
matching and equating pairs of objects for 
shape), Picture Squares (involving finding a 
pair of identical figures in a matrix of 16 
figures), and Two-Hand Coordination, which 
seems to have no claim to being a representa- 
tive of EFU, in spite of its having the highest 
loading of .48. 


Numerical Factors 


In his extensive review of factor analyses 
involving intellectual abilities, French (1951) 
remarked that the Thurstone factor of num- 
ber or numerical facility was among the best 
established. There would have been much 


duce (Guilforq, 1959). Tw 
however, called for re 
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symbolic implications). It is apparent that 
the Digit Symbol test involves newly learned 
connections between digits and symbols and 
that the goodness of learning and memory 15 
an important asset in gaining speed in that 
test, Numerical-operations tests reflect not 
recent and controlled learning but older and 
much less controlled learning. Confirmation 
of the relation of numerical-operation tests to 
MSI was forthcoming in a new analysis in a 
study of mathematical aptitude (Guilford, 
Hoepíner, & Petersen, 1965). But in the same 
analysis there was also a relation to NSI, thus 
also supporting the original placement of 
numerical-facility factors in the SI model. 
There is further information that throws 
light on the original Thurstone number factor 
and most of such factors found later. That !5 
to the effect that they have been heavily con- 
founded with a number-operation specific. It 
has usually happened that, as in Thurstone $ 
analysis, more than one numerical-operatio 
test has been used. In the AAF analyses, tw? 
number tests were included, and the loadings 
were near .80. In other words, they wer 
specific-inflated. Tenopyr* made a specia 
study of symbolic-memory factors, in which 
she introduced several new tests designed for 
MSI along with four numerical-operation 
tests. The result was a very strong numbe! 
specific factor featuring the number test 
only one of which (Addition) had also à a 
nificant loading on the common factor MSI 
In a reanalysis of the same battery,’ only ve 
numerical-operation score (a composite fro 
four number tests) was included, so 25 nó 
avoid involvement with a number-spec 
component. The score went on MSI, but RU 
a loading of only .35. The natural conclus e 
is that the number factors of historical n" 
have been largely specific variables, unique 
tests requiring operations with numbers: Mor 
possible confoundings with certain [4° 
representing SI abilities, depending Las 
composition of the test battery. The COM” sts 
factor affiliations for numerical-operation 
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seem to be with MSI and NSI, although 
weakly in both instances. Numerical-opera- 
tion tests therefore seem to measure mostly 
specific number-operation skills. 


Memory Factors 
Thurstone’s (1938a) first analysis reported 
a memory factor, which he apparently re- 
garded as possibly only one of a family of 
abilities, for he suggested possible factorial 
differences to be expected between incidental 
and intentional memory, rote and logical 
memory, and memory within different sense 
modalities. The three leading tests on his 
memory factor were all of the paired-associ- 
ates variety—Number-Number, Word-Num- 
ber, and Initials (associated with family 
names). It will be recognized that the con- 
tent is symbolic, and what is learned is in 
the nature of implications. The dominant SI 
ability is therefore MSI, an ability that was 
just discussed in connection with the number 
factors, There were two other tests loaded on 
the factor, Figure Recognition and Word 
Recognition, which probably represent abilities 
MFU and MMU (the latter assuming that 
word meanings were stored), but possibly 
also MSU (memory for symbolic units), if 
appearances of words in terms of letter con- 
tent were stored. Paired-associates tests in 
recall form, which he used, can also involve 
memory for units, so MSU could have been a 
confounding component in his memory factor. 
From the 1940 analysis, a clear-cut factor, 
interpretable as MSI, was reported, with two 
good tests on it, They were Initials and W ord- 
Number, two MSI tests from the first analysis. 
In the 1941 analysis, the memory factor was 
Tepresented by three tests for three different 
I memory abilities. Figure Recognition un- 
doubtedly stood for MFU, Digit Span in 
ISS (memory for symbolic systems), an 
First Names (another paired-associates test) 
for MSI. The cohesion of memory tests asa 
distinct group is some evidence, but not bs: 
Comforting evidence, for the uniqueness E E 
Memory operation category. Recent n pw 
Mm the ARP program have also revea'e les 
differentiation of abilities by factor D 
Within the memory category is not bh si 
časy, This difficulty is attributed to lac cr 
"Xperimental] control of what content Fa 
š stored when 
Toduct are cognized and then 


the subject memorizes his items of informa- 
tion, more than to actual correlations among 
memory abilities. 

In his 1950 analysis, Thurstone was con- 
cerned mainly with spatial abilities, hence he 
featured visual-figural tests. Two of them— 
Memory for Pictures and Memory for Geo- 
metric Designs—generated a common factor 
that he regarded as being distinct from his 
earlier memory factor. Both tests were in 
recognition form and were designed so that 
verbalizing should not help examinees very 
much. The SI ability suggested is MFU. One 
surprise was that a test called Visual Memory 
did not also go on the factor. The name of 
that test is misleading, however. Two frames 
for exposure, each with an irregular figure on 
it, were flashed on a screen with a very short 
blank interval between them (time interval 
not stated), with the subject to say whether 
or not the second is identical with the first. 
Despite the condition of successive exposure 
of the two figures to be compared, the test 
probably represents EFU and should not be 
expected to go for MFU. The nature of the 
factor on which Visual Memory did go, with 
one other test, cannot be interpreted, but it 
could have been for EFU. This is suggested 
by the need for comparison and judgment of 
two figures as to identity, although exposed 
in quick succession rather than simultaneously 
as in the usual EFU tests. 


Induction Factors 

In different analyses, Thurstone found what 
he called an “induction” factor that rather 
clearly corresponds to SI ability CSS (cogni- 
tion of symbolic systems), although in each 
case one or two extra tests of other SI 
abilities went along, apparently for lack of 
any better place to go. The clearly CSS tests 
were Number Series, Letter Series, Marks, 
Number Patterns, Tabular Completion, Secret 
Writing (using a letter code), and Letter 
Grouping. All these tests involve seeing some 
kind of systematic arrangement of symbols 
in order to solve problems, and their relations 
to CSS have been repeatedly demonstrated 
by the ARP. 

In the opinion of the writer, both the terms 
"reasoning" and "induction" are much too 
broad as labels for the ability CSS, which is 
only one of a great number, all of which 
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could be defended as reasoning abilities and 
many of them as induction abilities. It has 
been proposed elsewhere (Guilford, 1960) that 
the term "induction" could be applied to most 
of the cognition abilities, of which there are 
24 in the present SI model. It could apply to 
all; an explanation follows. 

If induction is regarded as abstracting or 
extracting general information from particu- 
lars, the term can be defended as applying to 
all of the six products of information. Units 
have attributes by which they are identified, 
and those attributes become known through 
repeated encounters with objects that possess 
them. Classes, relations, systems, and trans- 
formations are transposable entities. As such, 
they are derived through encounters with dif- 
ferent individual cases that embody them. Im- 
plications come about through replicated en- 
counters, to which the law of frequency in 
learning applies. By such reasoning, the con- 
cept of induction can be replaced with the 
operation category of cognition, with many 
advantages. The disposal of “deduction” is 
executed next. 


Deduction Factors 


On both occasions in which Thurstone rec- 
ognized deductive factors (1938a, 1940), two 
syllogistic tests were significant markers, al- 
beit along with some nonsyllogistic tests in 
both cases. In the first case, the syllogistic 
tests were False Premises and Syllogisms. The 
two were perhaps sufficiently different to be 
accepted as two different tests rather than 
two forms of the same test. The main differ- 
ence was the use of nonsensical premises in 
the first case, such as, “All truants are gold- 
fish,” and realistic premises in the second 
such as, “Jones is older than Brown.” In the 
Second analysis, the two syllogistic tests 
loaded on the deduction factor were known 
as Reasoning II and Reasoning IIT. The dif- 
ip shea to be ini the complexity of 

g in the two case: 
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The operation is evaluation rather than con- 
vergent production because the subject need 
not draw his own conclusions; they are given 
to him. A conclusion is an implication from 
the premises, therefore we should readily ac- 
cept the involvement of EMI. The involve- 
ment of EMR must arise from the fact that 
premises and conclusions often state rela- 
tions. Relations are obvious in statements 
such as “Jones is older than Brown," the rela- 
tion being “older than." Relations are not $0 
obvious in other kinds of propositions, but 
subjects possibly process the statements a5 
such. - 

It will probably be agreed that “deduction 
should apply to the act of drawing conclu- 
sions, not the act of judging them. It should 
take a test in completion form, requiring the 
subject to produce his own conclusions, i 
justify the label of “deduction.” The ARI 
has used such tests, and they go mainly De 
the factor for NMI, a convergent-production 
ability (Merrifield, Guilford, & Christensen 
1962). There is sometimes a little EMI d 
ance, enough to be significant, which should 
mean that the subject's evaluations of Wis 
own conclusions offer some help in making a 
good score. A 

As a process, deduction should be assigne“ 
logically to convergent production not evalua 
tion. The writer has proposed elsewhere (Guil- 
ford, 1960) that the four convergent-produe 
tion-of-implications abilities, NFI, NST, NM $ 
and NBI, which are concerned with implica 
tions, be regarded as deductive abilities. E 
he also proposed that the four abilities s 
cerned with convergent production of je 
tions—NFR, NSR, NMR, and NBR—?., 
involve deducing conclusions, in thinking 4 
analogy. Further thinking about the matt 4 
leads to the suggestion that all the converge" 
production abilities should be accepted jd 
deductive, in which case that operation EP 
well replace completely “deduction” 2? 
process. Then distinct varieties would ots 
recognized, and usefully so. Both the concel : 
of “induction” and “deduction” have he sq 
fore been very ambiguous. The cognition Jy 
Convergent-production operations are not F 
unambiguously defined, they also have qat 
pirical referents in the form of tasks 
Involve them. 
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Factors Leading to General Reasoning 

Although not recognizing it as a reasoning 
ability, Thurstone (1938a) reported a factor 
that was a possible forerunner of what later 
became known as “general reasoning” in the 
AAF program of analyses. The list of tests 
for it was headed by Arithmetic Reasoning, 
with a loading of .58. Thurstone character- 
ized the psychological trait as the ability to 
solve problems when there is “restriction in 
solution." The other six tests on the factor 
were heterogeneous with regard to probable 
SI content, however, There were three vo- 
cabulary tests (although another factor had 
been tagged as the verbal factor), a CFT test 
(Mechanical Movements), a CSU test (Spell- 
ing), and a numerical test (Numerical Judg- 
ment). 

In AAF research, it became customary to 
conclude that any factor for which an arith- 
metical-reasoning test had a high loading 
represented “general reasoning." The qualifi- 
cation “general” was in recognition of the 
fact that quite a variety of tests often loaded 
Significantly on the factor. E 

A series of analyses on reasoning abilities 
by the ARP ended with one directed espe- 
Cially at the nature of the general-reasoning 
factor, Alternative hypotheses as to its nature 
Were generated and investigated, with the 
Outcome supporting best the idea that it 3 
the ability to see the structure of problems 
(Kettner, Guilford, & Christensen, 1956). 

ater, the factor for general reasoning was 
Placed in the cell for CMS (cognition of 
Semantic systems) of the SI model. A ver- 
ally stated arithmetical problem conveys to 
the problem solver a somewhat complex con- 
Ception of a structure or system, conceived 
Semantically. There are, of course, many other 


“Inds of semantic systems. 


Word-Fluency Factors d 
From his first analysis. Taine n 
fast else psychological character 


" 
i i sith words 
“scribed as “fluency dealing W 


(Thurstone, 1938a). He recognized that RE 


i i re 
“anings of the words in this connection er 
ie of any importance; only "o 
5 z ic = Or 
fatures were relevant. His list of Den or 
* factor, however, contained only one E 
"oulq today satisfy the identification V 


139 


either word fluency or SI ability DSU (di- 
vergent production of symbolic units), which 
most word-fluency factors of the past have 
approached. The test in question was First 
and Last Letters, in which the subject is to 
list words beginning with specified first and 
last letters. It had the smallest significant 
loading (.39) among the tests reported for 
the factor. Leading on the factor were three 
tests of types that are now known to represent 
ability CSU, cognition of symbolic units, not 
divergent production of the same. Two of the 
three were anagram tests (Anagrams and Dis- 
arranged Words), and the third was Spelling. 
The type of spelling test used included words 
some of which were misspelled, with the 
subject to say which ones. Twice in ARP 
analyses such a test went on the factor for 
CSU along with anagrams tests and others. 

In Thurstone’s (1940) analysis, the factor 
that he identified as Word Fluency now ap- 
pears to have been a confounding of CSU 
and CSS. The CSU tests were Mirror Read- 
ing and Disarranged Words, which have been 
described before, and the CSS tests were Letter 
Series and Number Series, which had mini- 
mally significant loadings on the factor. 

In the Thurstone and Thurstone (1941) 
analysis, 5 good DSU tests were among the 
strongest of 10 tests on the factor identified 
as word fluency. They were Prefixes, First 
Letter, Suffixes, Rhyming Words, and First 
and Last Letters, all of which call for listing 
of real words fulfilling specified class prop- 
erties that their names indicate. That this 
factor was confounded with CSU again is 
indicated by three good-looking CSU tests in 
the list: Anagrams, Four-Letter Words, and 
Word Puzzles (an anagrams test). In Four- 
Letter Words, such words are hidden in con- 
tinuous rows of otherwise pied type. It is a 
matter of recognizing familiar words under 
difficult conditions, as is true of most other 


CSU tests. 


Closure Factors 


Thurstone’s two closure factors, C1 and 
C2, have occupied the attention of a number 
of investigators since he reported them in 
1944, The study was regarded as exploratory, 
as were most of the earlier analvses, A]. 
though the tests were restricted to visual in- 
put, the question was raised as to whether 


140 


such factors as were to be expected would 
transcend sensory modalities. Here we con- 
centrate attention on three of the new factors, 
two of which were identified as closure abil- 
ities, where closure means the integration of 
sensory input so as to differentiate figure from 
ground. Much attention was given to the 
question of whether it makes a difference if 
the closure takes place with and without dis- 
tractions or detracting influences. 

In the results from the Thurstone (1944) 
investigation were the bases for what became 
known as Closure 1, or the “speed and 
strength of closure," and in SI placement, 
ability CFU-V (where CFU is the cognition 
of figural units and the added “V” indicates 
that the ability pertains to visual input). This 
qualification is sometimes needed to remind us 
that there is another factorial ability recog- 
nized as CFU-A, for auditory input, from a 
factor reported by Fleishman, Roberts, and 
Friedman (1958). The latter deals with such 
tasks as recognizing radio-code signals and 
sets of dot sounds. In what follows, it should 
be understood that CFU refers to CFU-V, 
for we are not concerned further with CFU-A. 

It was stated earlier that Thurstone had 
actually found a factor such as CFU without 
realizing what it was. In his 1944 analysis, 
curiously, two factors looked like CFU. He 
declined to name his factor A, which he de- 
scribed as the ability to form a perceptual 
closure against some distraction and to hold 
that closure against detracting influences. The 
four CFU tests (as shown by later ARP ex- 
perience) were Hidden Digits, Street Gestalt 
Completion, Dotted Outlines, and Mutilated 
Words. All of these tests except Dotted Out- 


lines were described earlier, The dotted out- 
lines were of 6-inc 
with 
Other 
factor 
CFT tests (PMA 
Space and Kohs Blocks) and two NFT (con- 
ural transformations) 


es A and Gottschaldt 


» confounding also with the factor 
; for the leading test with a loading 
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of .54 was Shape Constancy, which was re- 
ported in connection with that ability earlier. 
Two other factors in the analysis seem to 
have better claims to interpretations as CFU 
and NFT, however, which leaves factor A 
something of a mystery. 

Thurstone called his factor F “speed ‘of 
perception,” which should not be confused 
with earlier factors going by a similar name. 
Here the two rather faithful markers for CFU 
—Street Gestalt Completion and Multilated 
Words—had higher loadings than for factor. 
A (.53 versus .35 and .44 versus .34, respec- 
tively). New tests, with even stronger x 
ings on factor F were Peripheral Span pa 
Dark Adaptation. Both involved perception © 
single letters (hence figural content) under 
difficult viewing conditions, flashed in the 
periphery of the subject's visual field. There 
were only two discordant facts against accept- 
ing factor F as CFU. These facts were the 
absence of Dotted Outlines and Hidden 
Digits from the list of significant tests. In 2 
recent analysis (see Footnote 6), the AR 
did find that a test Hidden Print (like Hidde? 
Digits except that letters were also use 
was one of the strongest tests for CFU. " 

The same ARP analysis threw much doubt 
upon the assumed need for distracting OY de 
tracting material working against closure ue 
even need for closure at all. A new a 
Close-Ups, composed of close-up photograP 
of segments of familiar objects, such B od 
key or a pineapple, was quite strongly 104 at 
on the factor. There was no need to jg 
organized elements into objects, and M 
were no distracting elements. The same ger 
be said regarding Thurstone's two periphe " 
vision tests, in which the objects were 5 
plete, isolated letters. hut 

Closure 2 is the identification for yere 
stone's 1944 factor E, Three NFT tests ' jc 
conspicuous in the list, namely, Hidden con 
tures (with human and animal for" c 
cealed in a landscape) and the tWO "pe 
Schaldt Figures tests, mentioned above, 
loadings for the latter were just about wre of 
to their loadings on factor A. The um 6 
the leading test, Two-Hand Coordinate i6 
difficult to hypothesize in SI terms. Fi 
little in the description of it to sugges j 
The principle of the test was to Te e in! 
difficulty in tapping with the two hands * 
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taneously as compared with tapping with each 
hand separately, in a task from which one 
should expect a large interference effect. In 
interpreting his factor E, Thurstone used ex- 
pressions such as "ability to shake off one 
set in order to take a new one," “freedom 
from Gestaltbindung," and “flexibility in 
manipulating several more or less irrelevant 
or conflicting gestalts." Flexibility seems ob- 
viously involved in many of the tests. The 
kind of flexibility would seem to be well de- 
scribed by SI ability NFT, convergent pro- 
duction of figural transformations. This con- 
cept is more precisely defined and is perhaps 
the core of Thurstone's 1944 factor E. 


RECAPITULATION 


In summarization, we may consider what 
structure-of-intellect abilities had their germs 
in the findings of Thurstone’s factor analyses 
and in his insights regarding outcomes. It is 
also of interest to see what SI abilities are 
represented in his published PMA tests. 


Forerunners of SI Abilities 


Thurstone’s analyses cannot be justifiably 
cited as supporting evidence of the validity of 
SI theory or the abilities involved, because 
tarely was any ability, now recognized as 
unique, found unconfounded in his results. 
But his insightful descriptions of his hypo- 
thetical psychological variables were very sug- 
gestive, and the later use of many of his tests 
in following out implications from his results 
Was often fruitful. Thurstone was free to admit 
that there were historical antecedents for 
Some of his factors, for instance, verbal, space, 
Number, and memory. Others had strong 
Claims of being novel findings. 


The verbal factors in Thurstot s tests 
analyses showed relations to numerous , 


. ber 
almost all verbal, but representing 2 S ae 
Of semantic abilities. As time ee neg 
factors converged toward a variable 


Pretable as CMU. 
His space factor v 


ne’s early 


yas undoubtedly closer 


to what was later called visualization E 
to the spatial orientation from AAF V iit 
n other words, it led to ability CFT i: e 
than CFS, He found some aap Bie 
atter in his last reported analysis of sp 
lests, 
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Concerning his “perceptual” factor there 
was much confusion. Although out of them 
came the AAF ability "perceptual speed," 
later to be identified with EFU, Thurstone's 
analyses were perpetually plagued with fail- 
ures to discriminate EFU, ESU, and EMU, 
which differ only in terms of informational 
content. Two or more of the three were 
usually confounded in his results. 

The number factor, with Thurstone as with 
others, has proved to have been largely a 
specific affair, a matter of number-operation 
skills. When only one numerical-operation 
test is included in an analysis, to avoid the 
possibility of confounding with specific vari- 
ance, such a test shows some small common- 
factor components identifiable as MSI and 
NSI—memory and convergent production 
with symbolic implications, respectively. 

Rather consistently and unusually clearly, 
Thurstone brought out a factor identifiable 
as MST, because in each analysis he used two 
or three different tests of the paired-associates 
type. The units of information to be associ- 
ated were numbers, letters, or names, hence 
they were symbolic information. Implication 
is essentially a new concept replacing as- 
sociation. In a late analysis, Thurstone found 
signs of another memory ability that was 
probably MFU, for it involved recognition 
of pictorial information. 

One of the clearest SI abilities found by 
Thurstone was for CSS (cognition of sym- 
bolic systems). He identified it as "induction," 
but in earlier discussions it was argued that 
manv or all of the abilities in the cognition 
category of the SI model could be regarded 
as instances of induction. 

He thought that he had demonstrated a 
“deduction” ability primarily by the use of 
multiple-choice syllogistic tests, but results 
in ARP research have shown that such tests 
are related to both SI abilities EMT and 
EMR. In other words, Thurstone’s deduction 
factor was a composite of two SI abilities, 
and they have to do with evaluation, not with 
drawing conclusions. Drawing conclusions isa 
matter of convergent production, and it was 
recommended that we substitute the latter 
operation for the concept of deduction. 

Without realizing it, Thurstone found a 
factor that could be cited as a forerunner of 
the AAF “general reasoning” and the ST 
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ability CMS. At least the fatcor was led 
twice by an arithmetical-reasoning test, which 
has consistently marked that factor. Before 
it was given the CMS identification, it was 
recognized as an ability to comprehend prob- 
lems. Arithmetical problems, and others, are 
conceived by the problem solver as systems. 

Thurstone’s word-fluency factor, based most 
often on anagrams tests, was undoubtedly a 
much better candidate for CSU than for DSU. 
After the first analyses, some added tests that 
clearly represent DSU also came out on the 
factor, indicating that it was a confounding 
of CSU and DSU. Both Fruchter (1948) and 
Zimmerman (1953) showed that another 
fluency factor was latent in Thurstone’s 
earliest analysis, represented by tests for ST 
ability DMR. 

Thurstone’s analysis of perceptual tests led 
to two closure factors, C1 and C2, and 
eventually to SI abilities CFU and NFT. AI- 
though he regarded C1 as a matter of strength 
of closure against detracting material, the 
latest ARP analysis in which CFU appeared 
throws serious doubt upon the distraction 
feature and even on the closure feature. A 
more accurate description of the trait is to 
call it the ability to recognize familiar visual 
objects under conditions of limited input. 
The essence of NFT is an ability to produce 
required revisions of visual objects. It often 
involves tearing down old objects in order to 
form new ones. It does not seem to be a skill 
for handling conflicting gestalts, as Thurstone 
supposed. 


SI Abilities in Published PMA Tests 


The SI abilities that are featured in the 
PMA tests may vary somewhat from one age 
level to another, but above the 5—7-year age 
level the dominant abilities seem to be rather 
clear. The vocabulary tests for the “verbal” 
ability should measure status in CMU. The 
Space test appears to be primarily for ability 
CFT or visualization, where movements or 
other changes are cognized. The test Percep- 
tion should measure EFU, the evaluation of 
visual-figural units. Word Fluency is very ap- 
Propriate as a measure of DSU, divergent pro- 
duction of symbolic units. 
here the Reasoning test has two kinds of 


ite EE 
ms i erent SI abilities are fea- 


e in it, two diff, 
ured. Letter-series items assess ability CSS, 
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cognition of symbolic systems, but items from 
the original Letter Grouping test should mea- 
sure CSC (cognition of symbolic classes) as 
well as CSS. In the latter test, the experi- 
menter has to see symbolic classes after seeing 
the principle of each letter set. 

The PMA Number test is even more com- 
plex factorially. Its common factors are MSI 
and NSI, for remembering and producing 
symbolic implications, but there is a strong 
specific component of number-operation skills. 
The importance of number skills in education 
and in other connections in our society, of 
course, cannot be denied, so that numerical- 
operation tests should continue to be useful. 
But for measures of the generalized skills 
known as MSI and NSI, there are much 
better tests, stronger and more univocal. 

The PMA Perceptual test included at the 
lower age levels should be for ability EFU, 
evaluation of figural units. The Memory test 
at the highest age level is for MSI, memory 
for symbolic implications, or what has been 
known historically as associative memory. 
There are no PMA tests representing the 
“deduction” factor in the PMA battery, at 
any age. Neither of the closure factors }§ 
represented, perhaps because they came after 
the battery had been adopted. 

Taking a more general look at the PMA 
battery, to note what kind of coverage of S 
abilities was accomplished, we see that there 
are three cognition abilities represented— 
CFT, CSS, and CMU. Three other operation 
categories have one test each—MSI, DSU; 
and EFU. In terms of content categories, 
there are two figural abilities, three symbolic: 
and only one semantic. In terms of product 
categories, three tests pertain to units, one to 
systems, one to transformations, and one to 
implications. Thus, although the sampling at 
categories has some breadth, there is stil 
severe limitation in coverage of the grea 
domain of human intelligence, 
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COMPUTER SIMULATION OF SAMPLE SIZE AND EXPERIMENTAL 
DESIGN IN HUMAN PSYCHOGENETICS ! 


L. J. EAVES? 


Department of Genetics, University of Birmingham, England 


A method of computer simulation is applied to the investigation of problems con- 
nected with the genetic analysis of continuously variable behavioral characters in 
human populations. The efficiency with which various components of genetic and 
environmental variation can be detected is related to sample size. It is found that 
a convincing partition of the genetic variance into its additive and nonadditive 
components requires much larger samples than those frequently employed in human 


psychogenetics. 


Jinks and Fulker (1970) have shown how 
the techniques of biometrical genetics may be 
used to estimate the various components of 
genetic and environmental variation for human 
behavioral phenotypes. They suggested many 
experimental designs that enable these com- 
ponents to be estimated, and the possibilities 
have been illustrated by the reanalysis of 
published data. The need was stressed for the 
least squares estimation of variance compon- 
ents to permit tests of significance of the esti- 
mates, and a statistical test of the goodness of 
fit of the genetical model assumed. 

Two studies (Eaves, 1969, 1970) have in- 
vestigated the relative efficiency of selected 
experimental designs for genetic analysis. 
Three minimal sets of data were compared 
(Eaves, 1969) that satisfied the requirements 
of Jinks and Fulker (1970). In the first study, 
the designs were considered with regard to 
their ability to separate additive and dominant 
genetic variation, and, in the second study, 
two of the sets were discussed in more detail 
to compare the efficiency with which herit- 
ability might be estimated and the two prin- 
cipal sources of environmental variation dis- 
tinguished. The three sets were as follows: 


1 This work, which was carried out while the author 
was in receipt of a British Science Research Council 
Postgraduate Studentship, is part of a research program 
in psychogenetics supported by the British Medical 
Research Council. 

*I am indebted to J. L. Jinks for advice and en- 
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preparation of the paper, and to J. S. Gale, M. J. 
Kearsey, and B. W. Barnes for their helpful discussion 
of many aspects of this work. 

Requests for reprints should be sent to L. J. Eaves 
Department of Genetics, University of Birmingham 
P. O. Box 363, Birmingham 15, England. ' 


Set 1: monozygotic twins reared together 
(MZ), monozygotic twins reared apart (MZa), 
and full siblings reared together (S4); 

Set 2: monozygotic twins reared together, 
full siblings reared apart (Sq), and full siblings 
reared together; 

Set 3: monozygotic twins reared together, 
full siblings reared apart, and half siblings 
reared together (HS7). 


All three sets can provide six second-degree 
statistics from the within- and between-pairs 
analyses of variance for each of the three 
groups of relatives comprising a particular set. 
The six statistics enable the least squares 
estimation of the four main components of @ 
simple genetical model: the additive genetic 
component (Dx), the nonadditive (dominance 
genetic component (Hi), the within-families 
environmental component (Ey), and the be- 
tween-families environmental component (Œ). 
The notation is derived from Mather (1949). 
Set 3 was not discussed in detail because ° 
the likely disparity between environments for 
siblings and half-siblings and is not consideret 
further here, 

Both the earlier investigations were con” 
cerned with the relative efficiencies of the dif- 
ferent sets of data for the estimation of various 
meaningful combinations of the four param 
eters of the basic model. It was shown that the 
efficiency of an experiment for a selected d 
terion was dependent on the relative prop?” 
tions of the three groups of differently quam 
pars comprising a given minimal set. a 
the aim of these studies was the comparis? 
of the designs on an equal basis, a consta". 
experimental size was assumed in nearly ge 
case, and no attempt was made to relate 
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conclusions to any absolute experimental size 
that might. be required for the detection of a 
given effect. It is essential before extensive and 
expensive data collection is undertaken that 
the knowledge gained from previous studies be 
related to the absolute experimental sizes 
necessary to detect particular genetic and en- 
vironmental influences. An attempt was made 
to satisfy this end by computer simulation of 
the least squares analysis of "data" con- 
structed for samples of differing size and com- 
position from hypothetical populations of 
known genetic composition and subject to 
known degrees of environmental influence. 


THEORY 

The application of the method of least 
squares to the estimation of components of 
variation is discussed by Hayman (1960) and 
Nelder (1960). A brief summary follows: 

If x is a vector of independent statistics 
derived by sampling a population, 0 a vector 
of predictor variables, the linear model for 
the prediction of x in terms of 0 may be written 
x=A0-+e, where A is the matrix of the coef- 
ficients of 0 in the expectations of x, and the 
elements of e are independent and \[0,o.7], 
that is, normally distributed with mean 0 and 
variance o,2. The elements of A depend solely 
on the structure of the model assumed, and in 
the specific cases to be discussed, A will con- 
sist of the coefficients of Dr, Hr, E1, and E» in 
the expectations of the mean squares on which 
the analysis is based. The expectations are 
fully tabulated by Eaves (1969). For a given 
set of statistics, for the assumed model A, 
[ 0 may be readily 


least squares estimates o er 
calculated. When the observed statistics are 
variances and not known with equal precision, 
it is appropriate to weight each second-degree 
statistic by the amount of information cor- 
responding to it. For a given variance x, based 
On 2 degrees of freedom, the amount of in- 
formation about x may be estimated from 
w=n/2x?, If the values of w for the observed 
ritten as the diagonal matrix 


Statistics are w: 
5 ates of 0, 


W, the maximum likelihood estim 
denoted by 6 may be obtained from 


ó— (A^w A)" (Awa). 


^/wA)^! is the 
Zach di al element of (A’wA)™ is thi 
Each diagonal eler e 


Variance of the corresponding 6, 


diagonal elements are the covariances of the 
estimates. The null hypothesis that each 6 
is zero may be tested by the normal deviate 
thus: 

c:=6, ‘Gas 
where 


The adequacy of the model may be assessed 
by calculating the sum of weighted squared 
deviations of the observed statistics from their 
values predicted by fitting the model to the 
data. This is distributed as chi-square for 
p —r degrees of freedom, where p is the number 
of observed statistics, and r is the number of 
parameters fitted to the data. Thus, 


Xn = (x — A0) w(x — Ad). 


The foregoing argument assumes that the ob- 
served statistics are normally distributed. An 
observed variance may be represented as a 
constant multiple of a variable that follows 
the chi-square distribution. Thus, the observed 
statistics will be approximately normal, pro- 
vided the degrees of freedom are large enough 
for the chi-square distribution to be approxi- 
mated bv the normal distribution. The con- 
vergence of the chi-square to the normal dis- 
tribution with increasing degrees of freedom 
is slow (Kendall & Stuart, 1963). As far as 
possible, therefore, the subsequent discussion 
avoids specific allusion to the significance of 
estimates derived from statistics based on few 
degrees of freedom. No sample sizes are 
quoted for tests of significance that would 
result from analyses based on fewer than 100 
pairs of individuals in the total sample. 
Samples as small as this will in any event be 
shown to be too small for most practical 
purposes. If the parameters of the model have 
the true values 0 and, for a hypothetical sam- 
ple, 6 = 0, the chi-square test of goodness of fit 
will be zero since the “observed” statistics from 
which 0 is estimated will be identical to their 
expected values obtained by fitting the model. 
For a given parameter, the expected value of 
the ratio 604 will be g = 0/c«. For a given 
sample size, the values of g will depend on the 
true values of the four parameters, 0, and 
related to these values by the coefficients, A, 
of 6 in the expectations of the observed statis- 
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tics, through the maximum likelihood equa- 
tions given above. 

For a given experimental design, for a 
population with constant values for 0, and for 
samples in which the ratio of the degrees of 
freedom for all the statistics is constant, the 
value of g for any parameter will be a linear 
function of the square root of the total sample 
size. Thus the value of g for a sample of size 
k, denoted g}, may be calculated for a given 
parameter, and the corresponding value gm 
for sample size m may be calculated by sub- 
stitution in the equation: 


m= N(n/h) gr. [1] 


This method of extrapolation is illustrated 
later and applied to the determination of 
sample sizes necessary to detect the given 
parameters at the desired level. 

It is required for a given 67-0 that the 
sample size be such that the null hypothesis 0 
is not greater than zero will be rejected at the 
5% level in 95% of possible samples. Other 
values for the power of the test might be 
selected if desired. For any real genetical situa- 
tion, the genetic parameters are expected to be 
Zero or positive, so that any large negative 
values will be attributable to chance or failure 
of the model. The test appropriate, therefore, 
is a one-tailed c test of the null hypothesis 
that 6/o¢ is not greater than zero. 

Writing e for cj, and g for 6/e, the power of 
the test is the probability that ô> 1.650, 
given that @> 0. Hence, the power of the 
test is the area under the curve for a variable 
8 which is .V[6,2*] corresponding to values of 
02 1.650. If ¢=(6—6)/c, so that c is 
N[0,1], when 6 = 1.650, c = 1.65 — g. Thus 
the power of the test is the area under the 
curve for a variable c which is N[0,1] for 
values of c exceeding 1.65 — £. It is required 
that this area be .95. From the standard pro- 
perties of the normal distribution, it is known 
that values of c greater than. — 1.65 must be 
taken for the required arca to be .95. Hence, 
for the power of the test to be .95, it is necessary 
that 1.65 — g = — 1.65 or g = 3.30, i 

Starting with a set of parameters of known 
an à hypothesized model A, values 
age ae for x. For an arbitrary 
Sie ee ag 1 the Corresponding values of 
Y be calculated providing the relative 
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contributions of the different groups of in- 
dividuals are assigned so that the degrees. of 
freedom for the statistics can take definite 
values. Substitution of g, and & in Equation 1 
above enables the sample size m to be cal- 
culated, which would give a value of 3.3 for 
Zm. Should more rigorous criteria be desired 
the value of g,, may be set correspondingly 
higher. 


METHOD 


Simulation experiments were conducted to 
discover the sample size necessary to obtain 
estimates of Dr, Hg, Fy, E», and the broad 
heritability 4,°, which are significant at ? 
desired level in a given proportion of samples. 
The broad heritability is the proportion of the 
total variation due to genetic causes, and may 
be estimated from (Mather, 1949): 


hy? = GDr + 1Hq)/ N 
CDr + 1Hg + Ei + E) 


Two of the three minimal sets of data given 
above were investigated. These were Set i 
MZr, MZa, and Sr; and Set 2: MZr, Say an 
Sr. Hypothetical values for the four parameters 
were calculated on the assumption that 
(Dn + 3Hn + Ei + Ej) = 1, and E = E 
The generation of the “data? and their sub” 
sequent analysis followed the method describe’ 
and illustrated in an earlier study (Eaves 
1969) on the assumption of a random-matiné 
population. The relative sizes of Dr and P 9 
were adjusted to give values of 4,2 from .1 t9 sí 
by intervals of .1 for each of three values ° 
N(Hn Dr), namely, .1, .5, and 1.0. The i 

mula N(Hg. Dr) is an indication of the relativ? 
importance of dominant gene action in e 
determination of variation. If the gene di 
quencies are equal at all loci involved in. " 
expression of a trait, V(Hp Dp) is an estim ie 
of the dominance ratio (Mather, 1949). On s 
basis of the studies reported carlier E e 
1969, 1970), the relative proportions 9 lat? 
groups comprising each minimal set of pan 
were fixed so that the efficiency of the px 

tion was maximized as far as possible. P $ in 
decided that the proportion of the poi od 
cach group comprising Set 1 should be ‘gee 
at 3 MZz, 3 MZ, and 4 Sq. This YOY to 
the Sacrifice of some efficiency with re put 
the estimation of broad heritability (t ^ 
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the equal proportions of MZz and MZ. tend 
to optimize the estimation of E; and Es; 
while the reasonably high proportion of Sr 
will help to reduce the correlation between 
Dr and Hg, thus maximizing the efficiency of 
the interpretation of gene action for a given 
trait. The proportions for Set 2 were fixed at 
AS MZe, .55 Sa, and .30 Sr. The previous 
studies suggest that these proportions provide 
the greatest all-round efficiency for this ex- 
perimental design. Since two minimal sets of 
data were considered, for each combination of 
nine levels of broad heritability and three 
levels of dominance, a total of 54 different ex- 
perimental situations were investigated. 

The weighted least squares analyses were 
conducted on the sets of hypothetical statistics 
and the variance-covariance matrix of the 
estimates of Dg, Hr, Ej, and E» was obtained 
from (A^w A)-! as outlined above. The method 
of the computation is illustrated for an experi- 
ment of unit size by Eaves (1969). The weights 
employed in these simulations were those ap- 
propriate for an experiment based on k = 6,400 
pairs, divided according to the proportions 
designated for each set in turn. Thus for 
Set 1, with a proportion of .3 MZr, it was 
assumed that there were 6400 X .3 = 1920 
pairs of MZ, so that the between-pairs mean 
square was based on 1,919 degrees of freedom, 
and the within-pairs mean square was based 
on 1,920 degrees of freedom. The degrees of 
freedom for the other statistics were calculated 
similarly. The choice of 6,400 pairs as the 
: size is a matter of convenience. 


initial sample k 
This sample size provides, in most cases, 
values of g} which are large enough to reduce 
rounding errors in extrapolation to langet i 
smaller samples. The fact that the number ex 
à simple square root further simplifies the 
extrapolation procedure. » 
In the previous studies, the computations 
were terminated with the calculation of the 
Variance-covariance matrix since the relative 
designs were evaluated on the 


efficiencies of the i e dum 
basis of the elements of this matrix. In 

Study, however, a further check on compu E 
tion was provided by postmultiplying the 
(A^wA)-! matrix by the vector (A wx) 2 
"estimate the elements of 0. Since there ks 
introduced into the 


No experimental error i 
; | population, the 


Sampling of this hypothetica 
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estimated parameters agreed exactly with 
those employed to generate the statistics, 
Each estimate was divided bv its standard 
error to give the mean value of g which could 
be expected for a sample of 6,400 pairs from 
à population which had the genetic properties 
and was subject to the relative environmental 
influences assumed in generating the data. 
The standard error of /;? was calculated as the 
square root of its variance. An expression for 
the variance of /,” is given by Eaves (1970). 
The mean value of g for the broad heritability 
could thus be calculated by dividing the esti- 
mate of /,? by its standard error. The sample 
sizes necessary to provide a value of £n = 3.30 
were calculated by linear extrapolation using 
Equation 1 above. 1 

The necessary weighted least squares simula- 
tions were conducted on the KDF-9 computer 
at the University of Birmingham, and the 
necessary sample sizes were calculated on a 
desk machine. 


RESULTS AND DISCUSSION 


Tables 1 and 2 give the values of g, for 
minimal data Sets 1 and 2, respectively. These 
values are calculated for an overall sample of 
6,400 pairs divided in the proportions fixed 
above and represent the expected values of 
the given genetic and environmental parame- 
ters divided bv their corresponding standard 
errors for samples of this size. Comparison of 
the values obtained for the two sets confirms 
the general conclusions of the earlier studies. 
Set 1 enables more efficient detection of the 
two environmental effects and the broad 
heritability, whereas Set 2 allows more efficient 
detection and separation of additive and non- 
additive genetic variation. The comparably 
small values of g; for Hr in both sets of data 
are remarkable and disturbing. They suggest 
that the detection of dominance even in large 
experiments of this kind is a difficult matter. 
lf positive assortative mating makes a sig- 
nificant contribution to variation for a par- 
ticular phenotype, the between-families genetic 
variance would be inflated relative to that 
within families, thus reducing still further the 
efficiency with which dominance could be 


detected. . 
Tables 3 and 4 give the number of pairs 
necessary for Sets 1 and 2, respectively, to 
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TABLE 1 


EXPECTED VALUES OF 0/es FOR ESTIMATES OF Broan 
“HERITABILITY AND THE Four MAIN COMPONENTS 
OF GENETIC AND ENVIRONMENTAL VARIATION 
BASED ON THE FIRST MINIMAL 
SET or DATA 


Expected value of 
Level for test of: 
of, he? 
domi- 
Sii Dr | Hn Ei Es du 
al 1 98 00 | 31.90 | 16.14 444 
3 3.22 02 | 31.49 | 15.01 14.85 
5 6.13 04 | 31.22 | 14.35 | 31.43 
7 9.88 06 | 31.06 | 14.01 | 68.70 
9 | 13.96 08 | 30.99 | 13.87 | 254.02 
E 1 87 12 | 31.90 | 16.14 4.44 
3 2.86 41 | 31.49 | 15.01 14.85 
5 5.43 78 | 31.22 | 14.35 | 3143 
7 8.73 | 1.23 | 31.06 | 14.01 | 68.73 
9 | 1228 | 1.71 | 30.99 | 13.87 | 254.17 
1.0 d 64 37 | 31.90 | 16.14 4.44 
3 2.14 | 1.17 | 31.49 | 15.01 14.85 
E 4.02 | 2.29 | 31.22 | 14.35 | 31.44 
7 6.40 | 3.56 | 31.06 | 14.01 | 68.79 
9 8.93 | 4.86 | 30.99 | 13.87 | 254.60 


Note.—An overall sample size of 6,400 pairs was assumed, 
divided proportionately into .3 MZr, .3 MZa, and .4 Sr. 


* Level of dominance defined as v (Hn/Dn). 


yield a value of g of 3.3 for the given parameters 
for selected proportions of genetic and en- 
vironmental variance. The calculation of the 
entries of this table is best illustrated by a 
specific example. Consider data Set 2 when the 
broad heritability is .5 and the degree of domin- 
ance is .5. To determine the experimental size 
required to detect additive genetic variation 
(Dr) at the 5% level in 95% of the possible 
samples, the corresponding value of g, for a 
sample of 6,400 pairs is read from Table 2. 
This value is 8.42. Substituting in Equation 1 


above, letting the unknown necessary experi- 
mental size be x: 


3.30 = v(n/6400) X 8.42 
Vn = (3.30 X 80)/ 8.42 
= 264/8.42 
= 31.35 


[2] 


when 
n = (31.35)? 
— 983 pairs. 


It will be recalled that this sa 


; l mple would 
consist of proportions of .15 


MZ, .55 Sa, and 
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30 Sr. Multiplying the sample size of 983 
pairs by these proportions vields the actual 
number of pairs of each group required to ob- 
tain significant estimates of Dn in 95% S 
samples of this size. These are 147 pairs at 
MZr, 541 pairs of S4, and 295 pairs of Sr. mc 
a given level of significance, the numerator F 
Equation 2 is constant and can be used for a 
determinations of sample size. 

It is necessary to exercise some caution when 
comparing the significance levels and sample 
Sizes for genetic parameters from simulations 
assuming the same broad heritability and dif- 
ferent levels of dominance. It must be re 
membered that in the generation of the “data 
the broad heritability was held constant, and 
not the proportion of variance accounted for 
either by Dr or Hg. The latter, which is & 
function of the broad heritability and the patie 
V(Hr/Dn), may be calculated easily knowing 
these two parameters. The tabulated bie 
therefore, allow the two designs to be compare 
strictly, but do not permit statements about 
the effect on the significance of either Dr or 
Hn of changes in other parameters, since in no 


TABLE 2 
EXPECTED VALUES Or 0/o9 FOR ESTIMATES OF BROAD 
HERITABILITY AND THE Four MatN. COMPONENTS 
OF GENETIC AND ENVIRONMENTAL VARIATION 
BASED ON THE SECOND MINIMAL 
SET or DATA 


Expected value of 


Levelof | pa for test of: P 
dominances | ^" 


Dr | He | Ei Ñe | Me 


1.77 
5.47 
9.46 
13.71 
18.41 


eine 


1.58 
4.88 
8.42 
12.20 
16.09 


tow 


1.0 1.18 


3.65 
6.30 
9.07 
11.98 


buhana 


Note.—An overall 


E sample size 
divided Proportionatel x 


i T ly into 
evel of dominance defined 
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case is the value of Dr or Hg held constant. 
It can be seen that where a parameter is held 
constant for different comparisons within a 
design, for example 45? for different levels of 
N(Hn /Dn), the significance of that parameter 
varies little. 

"The surprising consistency of the significance 
of E; over the whole range of heritability re- 
quires comment. This consistency results from 
the fact that the within-pair variance of MZr 
is a direct estimate of E;, and this statistic 
will receive a large weight in the analysis, 
contributing most of the information about 
E. If it may be assumed that E; is estimated 
solely from this statistic, then it may be shown 
that the value of g for E; is constant for all 
values of E, ,and depends on the sample size 
alone. Writing E; for the within-pair variance 
of MZ, and z for its degrees of freedom, the 
variance of E; is 2E4?; 4, so its standard error 


is E1X V2 Nn. 


TABLE 3 


NUMBER OF PAIRS NEEDED FOR THE First MINIMAL 
Dara SET TO OBTAIN VALUES OF THE FIVE 
ERS TABULATED WHICH ARE 
NIFICANT AT THE 556 LEVEL 
IN 95% or ALL POSSIBLE 
SAMPLES OF THE 
GIVEN SIZE 


Total number of pairs 
required to est 
Level of | jus Zk 
dominance 7 
Dr Hr E: | Es | he 

A 1 | 72,566 | 500 X 10° | 100 | 267 | 3,534 

3 6,721 | 174 X 10° | 100 | 309 | 316 

S 1,854 | 44X 10° | 100 | 338| 100 

7 714 | 19 X 105] 100 | 355 100 

9 358 | 11 X 105| 100 | 362 100 

5 1 | 92,075 | 48 X 10* | 100 | 267 | 3,534 

3 8,519 | 414,600 | 100 | 309} 316 

E] 2,363 | 114,555 | 100 | 338 100 

at 914 46,066 | 100 | 355 100 

9 462 23,833 | 100 | 362 100 

1.0 1 | 170,156 | 509,100 | 100 | 267 | 3,534 

3| 15218 50,913 | 100 | 309 | 316 

5 4313 13,289 | 100 | 338 100 

E 1,702 5,498 | 100 | 355 100 

9 874 2,951 | 100 | 362 100 


Note.—The values are only approximate, especially, for 
samples bigger than 106 pairs. Because the distri uten of t he 
Observed statistics is not normal for small samp! n E a 
humbers are not given for a total experimental size of less than 


| 100 pairs. 
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TABLE 4 


NUMBERS OF PAIRS NEEDED FOR THE SECOND MINIMAL 
Data SET TO OBTAIN VALUES OF THE FIVE 
IETERS TABULATED WHICH ARE 
NIFICANT AT THE 5% LEVEL 
IN 95% or ALL POSSIBLE 
SAMPLES OF THE 
GIVEN SIZE 


Expected number of pairs 
Level of | required to estimate: 
domi- | Ur 
nance | | 
Dr Hr Ei E: ha 
at 1 | 22,246 | 500 X 10* | 135 207 | 6,322 
3 | 2,329 | 150 X 10* | 139 310 578 
ES 778 | 28 X 105 | 143 547| 167 
"i 371] 14 X 105| 144) 1,351 100 
9 206 | 5.8 X 105 | 145 | 10,801 100 
5 1 | 27,916 | 2.2 X 10° | 135 208 | 6,322 
.3 | 2,926 | 187,300 |139 316 584 
EI 983 56,563 | 143 565 172 
7 468 24402 | 144 | 1,426 100 
9 269 13,405 | 145 | 10,076 100 
1.0 .1 | 50,051 | 230,400 | 135 210 | 6,360 
3 | 5,230 21,509 | 139 327 595 
5| 1,756 6,558 | 143 603 180 
2| 87 2,890 | 144] 1,575 100 
9 485 1,590 |145| 8,889 100 


e only approximate 
airs. Because the dis 
istics is not rmal for small samples, exact 
t given for a total experimental size of less than 


observed 
numbers 
100 pairs. 


Dividing E; by its standard error gives 


g=E x Vn/E,N2 

= n/V2. [5] 
That is, the significance level of E}, denoted 
by g, is independent of E;. For Set 2 it was 
decided that a sample of 6,400 pairs should 
contain 6400 X .15 = 960 pairs of MZr, so 
the degrees of freedom of E; will be 960. 
Substituting this value for » in Equation 3 
gives the value of g for E; on the assumption 
that this is estimated from the within-pair 
variance of MZ, alone. Thus, 


g = N960/N2 
= 30.98, 1.41 
= 21.91, 


a value which agrees very closely with those 
given in Table 2 for the mean significance level 
of E;, confirming that the other statistics pro- 
vide little information about this parameter, 
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TABLE 5 


COMPARISON OF VALUES FOR THE NORMAL DEVIATE 
(c) For Dn, Hn, Ey, AND Es FROM THE ANALYSIS 
or REAL DarA^ WITH THE VALUES OF Ó/s; 
PREDICTED FROM THE SIMULATION STUDIES 


| Observed | Predicted " | 
am- serve ^ Ed Differ- Proba- 
pu Pile value | value for! ‘ence | MS 
Da | LH6| 498 | 41 | 79 | 4 
Hr | .627| 245 | 2.32 i 9 
E, 07 7.00 767 | 67 | .5 
Es .06 2.00 1.02 | 3 


0.98 | 


a The data are those of Burt (1966), re 
Fulker (1970), The observed broad heritability is 
ratio V (Hn/Dn) is approximate ample of 826 
The predicted values of c are cal for a sample of 
pairs, divided according to data Sc when the broad herita- 
10. 


bility is .9 and the ratio V(Hn/Di) is 


The discussion has been restricted so far to 
the consideration of the sample sizes required 
to permit reliable detection of the parameters 
concerned. This is clearly only preliminary to 
the much more exacting task of estimation 
for which even larger samples may be required 
to provide estimates of parameters for which 
the confidence limits are sufficiently narrow. 
A simple example will suffice to illustrate this 
difficulty. Consider a trait for which the broad 
heritability is .5. For the 95% confidence 
limits of %,? to be 5.1, it is required that 
2o=.1, that o=.05, and consequently that 
=g = 10.0. To obtain a value of g of 
10.0, it is required that 


Vn = (10 X 80) /g, 
= 800 g; 


(see Equation 2 above), where g; is the cor- 
responding expected value of 0, 
of 6,400 pairs. Employing 
Values for g, gives, for Set TH 
7 648 pairs; and for Set 2, n = (800, 20.40)? 
7 1538 pairs. These are substantially larger 
Samples than those required for the mere 
detection of heritable variation (see Tables 
3 and 4) 
Tn order to validate the conclusion of these 
simulations, it is desirable to compare the 
results of the study with those obtained from 
the analysis of real data, No direct comparison 
E possible because published analyses of 
quantitative behavioral data are based on data 
eda neither to Set 1 nor Set 2. One 
ums a however, provides a large enough 
Ple of sufficient groups of relatives to 


7 for a sample 
the tabulated 
(800, 31.43)? 
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permit the least squares estimation of the four 
main components of variation. This is the 
study of Burt (1966) on the inheritance of 
intelligence. The total sample of 826 pairs used 
in Burts study was composed of 95 pairs of 
MZr, 53 pairs of MZa, 127 pairs of dizygotic 
twins reared together, 264 pairs of Sy, 151 
pairs of S4, and 136 pairs of unrelated in- 
dividuals reared together. 

The necessary reanalysis of this data, using 
the method of weighted least squares, has been 
conducted by Jinks and Fulker (1970). They 
found that the basic four-parameter model 
was the simplest that would adequately fit 
the data. The estimates they obtained for the 
four parameters are reproduced in Table 5. 
When allowance was made for assortative 
mating, the broad heritability was estimated 
as .86, and the dominance ratio, approximated 
by V(Hr Dr), is about .75. 

The design of this experiment is not exactly 
the same as cither of the two sets considered 
in this study, but it is unlikely that its efficiency 
is markedly different, The inclusion of Sa 
would tend to improve the efficiency with 
which Dr and Hg are estimated, relative to 
that anticipated from an experiment designed 
according to Set 1, while the inclusion of MZ 
will ensure that the estimation of E, will be 
more efficient than would be the case from 
Set 2. Thus Burt’s experiment. probably in- 
corporates the advantages of both Sets 1 and 
2 and would be more efficient than either. For 
the purpose of the subsequent discussion, it 15 
supposed that the efficiency of this experiment 
is close to that of Set 2. It is possible to cal- 
culate, using the values of Table 2, the ex 
pected values of g for a sample of this size, with 
the observed values of heritability and domin" 
ance. The additional criterion of the simula- 
tions, that E, and E, are equal, is fortuitous? 
met by this body of data (sce Table 5). The 
closest simulated approximation to this stuc M 
is the case where %4? is .9 and the dominance 
ratio is 1.0. The values of g, on the basis es 
Piece assumptions, are given in Table 5 for ; 
sample of 800 pairs. Assuming that £ is NL d 
where g is the c : of & 
it is possible to tha 
the observed y 
do not differ 
by the meth 
the 


alculated expected value 
lest the null hypothesis a 
alues of c, considered separate d 
from the values of g calcula e 
od of simulation, Table 5 £* c 


deviation of the observed values ° 
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from the values of g, and the probability that 
deviations as large as, or larger than, those 
Observed would occur by chance. The good 
agreement between observed and expected 
confirms that the simulation method provides 
a good approximation to the experimental 
situation. 

The nucleus of psychogenetic studies has 
been supplied by classical twin studies, 
although conclusions are often based on 
samples of fewer than 100 pairs. Such are 
probably adequate to estimate the within- 
family environmental variance from the MZr 
and to provide a test of the significance of 
within-family genetic influences (Vandenberg, 
1962). There is, however, little further informa- 
tion about gene action to be derived from 
studies based solely on twin data (Jinks & 
Fulker, 1970), so the desirability: of larger twin 
studies of the classical kind is questionable, 
except for the information they could provide 
about genotype-environment interaction (Jinks 
& Fulker, 1970), 

The two designs discussed in this study, and 
the many possible extensions of these (Cattell, 
1960; Jinks & Fulker, 1970), allow further 
genetic information to be obtained. The 
simulations performed assuming these more 
complex designs, however, indicate that reliable 
results with regard to gene action are possible 
only with much larger samples than those 
hitherto employed. Indeed, although the 
detection of dominant gene action is theoretic- 
ally possible with these alternative designs, 
the necessary sample sizes are virtually 
prohibitive. Inbreeding studies for the detec- 
lion of directional dominance have been used 
(Barrai, Cavalli-Sforza, & Mainardi, 1964; 
Jinks & Fulker, 1970) and should be further 
investigated, though these provide no informa- 
tion about ambidirectional dominance. How- 
ever, it is possible to estimate the amount of 
additive genetic variation from the covariance 
of offspring and parent. The expectation for 
this statistic in terms of the genetical model is 
iDn (Mather, 1949), although this is likely to 
give an inflated estimate of the additive varia- 
tion on account of the environmental influence 
of parent on offspring. If the parent-olispring 
Covariance is free from environmental bias, 
then the inclusion of this statistic in the 
analysis would certainly lead to a marked re- 
duction in the correlation. between estimates 
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of Dn and Hg and consequently increase the 
efficiency with which these two parameters 
may be estimated and separated. 

The variance of the means of groups of 
distant relatives, for example, between groups 
of cousins, is also free from dominance varia- 
tion. Where extensive pedigree data are 
available, it is conceivable that these com- 
ponents might be estimated directly from a 
hierarchical analysis of variance. The weak- 
nesses of this approach to the estimation of 
additive genetic variance are as follows: 

1. The coefficients of Dpr in the expectations 
of the variance components become smaller 
as the relationship becomes more remote. 
Remote relationships will thus contribute 
little additional information to the estimation 
of Dr. This effect will be made more obvious 
since: 

2. The estimates of components derived 
from the means of the groups of more distant 
relatives will be based on very few degrees of 
freedom and will thus receive little weight 
in the least squares analysis. 

3. If certain parts of the pedigree are sub- 
jected to systematic environmental differences 
due, for example, to migration to different 
localities with different cultural norms, this 
will inflate the variance components. On the 
other hand, if a pedigree has been subjected 
to a common cultural effect, it is likely that 
the difference between groups within the 
pedigree would be underestimated. It is pos- 
sible that this consideration might form the 
basis for the detection of cultural influences 
on the formation of behavior (Fisher, 1930) 

It should now be clear that the detailed 
analysis of a single trait requires much larger 
samples than those originally conceived in 
many earlier experiments in human psy- 
chogenetics. With regard to the estimation of 
nature-nature ratios by the multiple abstract 
variance analysis (MAVA) method, Cattell 
(1963) suggested that 250 pairs would make 
“the confidence limits narrow to an acceptable 
range.” No indication is given about how 
stringent the requirements of acceptability 
are, nor the degree of genetic determination 
that might be predictably detected by this 
method. It can be seen now that 250 pairs 
would not allow the detection of genetic 
variation with any confidence when the broad 
heritability is less than .5. An overall sample 
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size of 2,500 pairs is suggested as adequate for 
the complete solution of the MAVA model 
(Cattell, 1963). The conclusions of this study 
indicate that a sample of this size would permit 
considerably confident estimation of the 
broad heritability, the amount of additive 
genetic variance, and the two environmental 
components, except when these effects are 
small. It would still be inadequate, however, 
for the detection of nonadditive genetic varia- 
tion, except when the effect is substantial. 
Moreover, it should be remembered that these 
conclusions appertain only to an experiment 
that is designed to be as near as possible 
optimal with respect to the proportions of the 
populations measured. Thus, for a sample of 
2,500 pairs, either the measurement of 750 
pairs of both MZ4 and MZ, and 1,000 pairs of 
Sr, or 375 pairs of MZ, 1,375 pairs of Sa, and 
750 pairs of Sr, would be involved. Such is, in 
any event, a considerable undertaking. 

A comparable need for large studies is 
emerging in another field, that is, with regard 
to the application of multivariate procedures 
to the elucidation of genetic hypotheses. A 
multivariate extension of the classical twin 
study has already been employed (Bock & 
Vandenberg, 1968; Vandenberg, 1965), and 
large samples are seen as essential in order that 
stable factor structure may be: located and 
suitable comparisons conducted. An effective 
fusion between the detail of the biometrical- 
genetical approach and the breadth of the 
multivariate approach is still far from being 
realized. Lt is evident, however, that both will 
require more time and effort if samples are 
large enough for a convincing genetic analvsis 
of human behavior. D 


SUMMARY 


A method of computer simulation for the 
investigation of factors affecting experimental 
design in the area of human psychogenetics 
has been further developed to study the effects 
of sample size on the detection and estimation 
of broad heritability and four possible com- 
ponents of genetic and environmental varia- 
tion for quantitative characters in human 
populations. 

The results of earlier simulation studies are 
confirmed, and it is shown that sample sizes 
need to be much larger than those generally 
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employed if it is desired to go beyond the mere 
demonstration that variation has a heritable 
component. The results of this simulation study 
are shown to agree favorably with the con- 
clusions of a reanalysis of published data. It is 
noted that possible improvements or alterna- 
tives to the designs discussed are not free from 
limitations. 


REFERENCES 

Barrar, I., CAVALLI-SFORZA, L. L., & MAINARDI, M. 
Testing a model of dominant inheritance for metrical 
traits in man, Meredity, 1964, 19, 651-668. 

Bock, R. D., & VANDENBERG, S. G. Components of 
heritable variation in mental test scores. In S. G. 
Vandenberg (Ed.), Progress in human behavior 
genetics. Baltimore: Johns Hopkins Press, 1968. 


Burt, C. The genetic determination of differences in 
intelligence. British Journal of Psychology, 1966, 57, 
137- 


CaTTELL, R. B. The multiple abstract variance analysis 
equations and solutions: For nature-nurture Te- 
search on continuous variables. Psychological Revie, 
1960, 67, 353-372. 

CarrELL, R. B. The interaction of hereditary and 
environmental influences. British Journal of Statis- 
lical Psychology, 1963, 16, 191-210. r 

Eaves, L. J. The genetic analysis of continuous varia- 
tion : A comparison of experimental designs applicable 
to human data. British Journal of Mathematical and 

Statistical Psychology, 1969, 22, 131-147. P 

Eaves, L. J. The genetic analysis of continuous varia- 
tion: A comparison of experimental designs applicable 
to human data. IL. Estimation of heritability and 
comparison of environmental components. British 
Journal of Mathematical and Statistical Psychology 
1970, 23, 189-198. 

FISHER, R. A. The genetical theory of natural selection 
Oxford: Oxford University Press, 1930. 

Jis, J. L., & Furker, D. W. Comparison of the 
biometrical genetical, MAVA, and classical @P” 
proaches to the analysis of human behavior. 5 
chological Bulletin, 1970, 73, 311-349. í 

Hayman, B. I. Maximum likelihood estimation " 

, genetic components. Biometrics, 1960, 16, 369-381. 

Kenpatt, M. G., & Stuart, A. The advanced theory ? 
statistics. Vol. 1. London: Griffin, 1963. 

Marner, K, Biomelrical genetics. London: Methue 
1949. : 
NELDER, J. A. Estimation of variance components 
certain types of experiment in quantitative gener 
In O. Kempthorne (Ed.), Biometrical gene’ 
_ London: Pergamon Press, 1960. e ly: 

VANDENBERG, S. G. ‘The hereditary abilities SUC E 
Hereditary components in a psychological 962, 
battery. American Journal of Human Genettcs, 1 
14, 220-237. 

VANDENBERG, S. G. Multivariate analysis 
differences. In S. G. Vandenberg (Ed), 


and goals in human behavior genetics. NeW 
Academic Press, 1965, 


(Received July 14, 1970) 


p twin 
Methods 
York: 


a 


| Vor. 77, No. 3 


JUDITH G. 


In conjunction with th 
ideological and strategic base, great 
opinions about mental illness, especially 
labeling, care, and treatment of mental 
has emerged a sizable body of research 
examples are considered here. This revi 
studies of attitudes about mental illness 
Historical trends in such attitudes 


the susceptibility of such attitude 
tical experience is followed by 
attitudes and behavior tow: 


work in this field are considered. 


Psychiatric theory and practice 
nited States have changed radically 
ast 30 years, particularly in the last decade. 
As the predominance of the psychoanalytic 
- Orientation began to decline after World War 
II, there emerged a growing concern about 
the patient’s social context in addition to the 
individual characteristics stressed in the psy- 
choanalytic model of health and illness. In 
the 1950s most of the new thinking and inno- 
vation in theory and treatment focused on 
psychiatric hospitalization. and the hospital 
milieu, while in the 1960s more attention has 
yeen devoted to outpatient facilities in the 
community, This movement in the mental 
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health field known variously as social psy- 
chiatry, community psychiatry, and com- 
munity mental health— succeeded by the early 
1960s in capturing a substantial share of 
contemporary interest, academic instruction, 
and federal funds for research and modifica- 
tions in treatment methods? While these la- 
bels are differently interpreted by several sub- 
groups of mental health specialists, t 
growing empirical evidence to indica 
they refer to a distinct and 
chiatric orientation toward conceptions of 
mental health and illness, etiologies, and treat- 
ment strategies. There is little discussion in 
the literature regarding the philosophical un- 
derpinnings of this point of view, but evidence 
is emerging to demonstrate differences in the 


attitudes and actions of those Who share this 
orientation, 


here is 


te that 
independent psy- 


s Caplan (1969) has illustrated, Community psy- 
matry of the 1960s is not a new phenomenon in the 
history of American psychiatry, but is related to the 
"moral treatment” ideology Prevalent in the United 
States up to the time of the Civil War. 
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In conjunction with the rise of social psy- 
chiatry as an increasingly accepted ideological 
and strategic base, there has developed great 
interest in attitudes and opinions about mental 
illness, especially among those who are in- 
volved in the labeling, care, and treatment of 
mental patients. Since the late 1950s when 
questionnaires were constructed to investigate 
such attitudes, there has emerged a sizable 
body of research in this area, concerning the 
delineation of attitudes held by the general 
public, by mental health personnel, and by 
patients and their families; the susceptibility 
of such attitudes to modification through aca- 
demic or practical experience; and the rela- 
tionship between attitudes and behavior. 

The object of this article is to review the 
literature concerning attitudes toward mental 
illness, describing the major instruments used 
and briefly considering representative studies. 
No attempt has been made to cite every rele- 
vant investigation or to deal with broader 
issues such as psychiatric ideologies in gen- 
eral; attention is directed specifically to stud- 
les of attitudes about mental illness, mental 
hospitals, and mental patients. 


Historical Trends in Attitudes about M ental 
Illness 


Definitions of deviant behavior and the 
assignment of labels to such behavior strongly 
influence attitudes toward those regarded as 
deviant. The label seems to activate preexist- 
ing beliefs and value systems, usually to the 
detriment of the individuals so labeled, A 
brief historical overview of such definitio 
and concomitant atti 
of their interaction, 


By referring to behavioral deviance as men- 
tal illness as we do today, we imply that such 
a condition is a burden, a handicap caused by 
the invasion of a foreign agent analogous to 
4 germ or a virus that attacks an innocent host 
and lingers interminably. This construction 
underlines the undesirable nature of the con- 
dition, which most people would like to avoid 

aving themselves or witnessing in others. 
After all, nobody wants to be ill, and few 
People actively enjoy the presence of others 


ues are ill. The problem that currently con. 
ey so many in the mental health profes- 
ons is not this negative evaluation of mental 


ns 
tudes reflects the extent 
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illness, but the accompanying rejecting atti-| 


tudes manifested toward the mentally ill and 3 


formerly ill, together with other implications 
inherent in the medical model regarding etio- 
logical theory, therapeutic stance, and the 
feasibility of preventative strategies. 

In a very real sense, mental patients have 
taken the place of lepers as targets of public 
disgust, dislike, and rejection. As Foucault 
(1965) has so eloquently described, insanity 
was defined as “unreason” and typically con- 
sidered a part of everyday life from the time 
of the Greeks through the Middle Ages, when 
those who behaved peculiarly were labeled 
madmen or fools. Not until the seventeenth 
century were such people regarded as a public 
threat and confined to special institutions; 
many of which had served as leprosariums un- 
til that disease disappeared from the Western 
world. The constitution of madness as a mens 
tal illness, the introduction of the medical 
model to psychiatric formulations, occurred at 
the end of the eighteenth century and led to 
the rise of the "scientific psychiatry" of the 
nineteenth century. 

With less drama but more specific docu- 
mentation, Bockhoven (1963) and Caplan 
(1969) have traced the history of American 
psychiatry from the eighteenth century, with 
its emphasis on moral (ie., psychological 
treatment, to the medicalization of emotion? 
disturbances in the nineteenth and early 
twentieth centuries. Moral treatment CO? 
sisted of temporarily sending the disturbe 
individual to a “retreat” where “he was ma $ 
comfortable, his interest aroused, his d 
Ship invited, and discussion of his trouble 
encouraged [Bockhoven, 1963, p. 12]." pe 
approach was based on the assumptions bw 
disturbed behavior was caused either by u^ 
norance or incorrect understanding—that e 
remediable cognitive lack—and that it coU 
be modified by manipulation of social a" 
Psychological variables. 

In the midnineteenth century, both 2 
these assumptions were abandoned, as ie 
moral treatment itself, Instead, it became £€' 
erally believed that disturbed behavior si 
the result of a physical disease of unkno a 
etiology, existing like an ulcer within the is 1 
tient, which could be treated only by chem 

or physical means, 


of 


x we 
Since such means 
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then unknown, patients seldom received more 
than custodial care. Bockhoven attributed this 
change in hospital practices to several factors: 
lack of inspired leadership aíter the inno- 
Vators of moral treatment died; failure to 
plan ahead for adequate facilities so that the 
existing ones became terribly overcrowded; 
and the rather abrupt appearance after the 
Civil War of large numbers of “foreign insane 
paupers" of low social and economic status 
who largely replaced the middle-class Yankee 
Patients who had formerly constituted the 
Majority among patient populations. As the 
Mental hospitals changed from homelike hav- 
ens to huge custodial warehouses, discharge 
rates declined steadily. 

After the turn of the century, the first signs 
of renewed interest appeared in the methods 
and underlying philosophy of moral treat- 
Ment, and in the ensuing years a point of view 
has evolved in psychiatry that is quite inde- 
Pendent of the scientific model of the 1800s. 

t has become increasingly acknowledged by 
Many professionals in the mental health field 
that mental illness can be understood as an 
exaggeration of particular behaviors common 
to all men, brought about by stressful life 
Conditions and resulting in impairment of the 
ability to cope with social expectations and 
Standards, According to this viewpoint, psy- 
chopathology is not seen as a subdermal phe- 
nomenon but as the product of transactions 

tween the individual and his social and 
Physica] environment. This necessarily =a 
tails à shift away from the traditional medica 
Model of disease: toward essentially psycho- 
Socia] Conceptions of problems in living. 

ereas psychiatric symptoms used pa 
‘erpreted as signs of physical illness, an a 
the Psychoanalytic frame of reference — 
as defenses against the individual's interna 
Processes, it is now becoming more ranis 

efine a symptom as a way of pex But 
ternal events or other people (cf. Haley, 

3 


: : á 
Replacement of the medical aah te 

Psychosocial or public health model in | i 
TY represents a fairly recent change in 


F » ional 
"Ong the more highly trained ey 
the field—primarily those with postg 


found 
training—and is not yet , ecd 
Ong lower-ranking personnel in psy 
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institutions, much less the general public, As 
the following 


from nurses and aides in their opinions about 


Measures Used 


Before reviewing specific studies of atti- 
tudes about mental illness, it would seem 
helpful to describe in some detail the measur- 
ing instruments most commonly employed. 
These include Nunnally’s (1961) question- 
naire, the Star abstracts; Gilbert and Levin- 
son's (1957) Custodial Mental Illness Ideol- 
ogy Scale (CMI), and Cohen and Struening’s 
(1962) Opinion about Mental Illness Scale 


Froemel and Zolik’s 
Attitude toward Mental Illness 
(ATMI), and Reznikofi’s (1963) Multiple 
Choice Attitudes Questionnaire, these instru- 
ments have been used primarily or exclusively 
by their authors, and their usefulness has not 
yet been clearly established. 

Nunnally’s (1961) questionnaire, con- 
structed in 1954, was designed to learn what 
the public *knows and thinks" about menta] 
illness—that is, to elicit information as well 
as attitudes. Over 3,000 opinion Statements 
related to the causes, symptoms, prognosis, in- 
cidence, and social Significance of menta] 
health problems were collected from diverse 
public and professional Sources. By removing 
apparent duplicates and redundancies, the 
number of items was reduced to 240, with a 
7-point Likert format. Three hundred and 
fifty people were given this form; their re- 
sponses were factor analyzed to identify un. 
derlying dimensions describing the content of 
public information and thinking about mental 
illness. Ten factors were thus obtained, They 


*S. Star. What the public thinks about menta] 
health and mental illness. Paper presented at the 
annual meeting of the National Association for Men- 
tal Health, National Opinion Research Council, 1959. 
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were not statistically strong; few of the load- 
ings were above .40, and these 10 factors ac- 
counted for less than 25% of the total item 
variance. Nunnally interpreted this to show 
that public information about mental illness 
is not highly structured or crystallized. Be- 
liefs about mental illness held by the public, 
as defined in the first several factors, concern 
the peculiar physical appearance of the men- 
tally ill, their lack of will power, the greater 
susceptibility of women and the aged, and 
the role of morbid thoughts in precipitating 
mental illness. In short, these factors describe 
the particular contents of the public's concep- 
tions of mental illness. 

The Star abstracts (see Footnote 4) repre- 
sent a different approach toward the elucida- 
tion of attitudes about mental illness. Each 
abstract consists of a paragraph, written in 
nontechnical style, describing behavior meant 
to illustrate a particular diagnostic entity such 
as simple schizophrenia or neurotic depres- 
sion. Subjects read these paragraphs and may 
be asked to rank them in terms of perceived 
pathology, social distance, or other parameters. 
Individuals or groups of respondents may 
then be compared in terms of their notions of 
what constitutes abnormal behavior. 

In contrast to Nunnally’s questionnaire, 
which describes the specific content of beliefs 
about mental illness (e.g., “the eyes of the 
insane are glassy”), subsequent investigators 
have developed scales to classify respondents 
in terms of underlying ideologies. A series of 
studies by Gilbert and Levinson and their 


associates defined and studied the nature of 


the ideological positions of humanism versus 
custodialism, and constructed the Custodial 
Mental Illness Ideology Scale (CMI) to elicit 
these positions. The CMI consists of 20 Lik- 
ert-type statements of opinion on basic ques- 
tions concerning mental illness and patient 
care, and was designed to place respondents 
along a single continuum of attitudes ranging 
from custodialism to humanism. The extreme 
Custodial point of view holds that mental 
patients cannot ever be really cured, that they 
are potentially dangerous and need external 
Controls; in general, it is associated with au- 
thoritarianism and is highly correlated with 
the California F Scale. Its converse, human- 
ism, is related to a generally egalitarian ori- 
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entation. While the CMI's authors assume 
that attitudes about mental illness fall within 
a single dimension, related studies have shown 
that the polar extremes can be further subdi- 
vided. Thus the humanistic point of view can 
be analyzed in terms of components such as 
psychotherapeutic and sociotherapeutic ori- 
entations. The concept of custodialism, aligned 
as it is with authoritarianism, similarly lends 
itself to further refinement. 

The CMI was the first carefully designed, 
psychometrically adequate instrument devel- 
oped to assess attitudes toward mental illness. 
However, workers in this field became increas- 
ingly unhappy with its underlying assumption, 
that such attitudes fall within a single descrip- 
tive dimension. When Cohen and Struening 
(1962) developed a multidimensional pm 
Opinions about Mental Illness (OMI), zi 
investigators adopted it rather than the C^ 
to assess such attitudes. 

The OMI was developed from a pool of 70 
Likert-type opinion items that were written 
by the authors or adapted from Nunnally $ 
questionnaire, the California F Scale, and the 
CMI. They were administered to two ie 
samples of Veterans Administration hospita 
personnel. The responses of each sample were 
factor analyzed, and five independent factors 
were identified. Scales were then develope 
from the original 70 items to measure each fi 
these factors, so that the final 51-item O^ " 
questionnaire provides five separate scores 10 
each respondent. The five factors are: 


Factor A: 


Sis iis Gace 
í Authoritarianism, This is clear? 
identified wit 


. H d E m 
h the California F Scale and ti- 
cludes its authoritarian submission and y 
intraception combined with a view of 


B a : "EI co- 
mentally ill as an inferior class requiring 
ercive handling, 


Factor B: Bene 
istic view toward 
tive from religion 
science, 

Factor C: 
orientation m 
an illness like 
adapted to ps 
individua] mal 

actor D: 
tral belief 


; nal- 
volence. A kindly, pater dër 
patients whose ca on 
and humanism rather 


is 
Mental Hygiene Ideology- be 
aintains that “mental HIDE is 
any other." A medical mode 1 
ychiatric problems, focusing 
ladaptation. "E 
Social Restrictiveness. Its ‘ 
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threat to society, particularly the family, and 
must therefore be restricted in his functioning 
both during and after hospitalization. 

Factor E: Interpersonal Etiology. The posi- 
tive pole of this factor reflects the belief that 
mental illness arises from interpersonal experi- 
ence, especially deprivation of parental love 
during childhood. 


In a later study, Struening and Cohen 
(1963) demonstrated the factorial stability of 
these opinions across three samples of mental 
hospital personnel identical in occupational 
professional composition, but greatly varied 
in religious preference and regional back- 
ground. The results of the three factor analy- 
ses of the 1963 study provide the basis for 
the current scoring of the 51 items into the 
measures of the five factors described above. 
Psychometric properties of the five factor 
scales are also included. 

The four scales described thus far were de- 
signed to assess attitudes about mental illness 
and mental patients. A related but slightly 
different approach was followed by Souelem 
(1955), who developed two forms of a 36-item 
scale with dichotomous agree-disagree format, 
meant to elicit patients’ attitudes toward 
mental hospitals. The scale has also been used 
to assess attitudes of various professional 
groups who deal with mental patients. Kahn, 
Jones, MacDonald, Connors, and Burchard 
(1963) developed a 100-item scale meant to 
cover 12 areas measuring patients’ attitudes 
toward psychiatrists, mental illness, and hos- 
pitalization in an attempt to improve upon 
the Souelem scale, which they regarded as too 
generalized. The scale was given to 54 pa- 
tients, and the 45 most reliable items were 
subjected to factor analysis. Seventeen factors 
emerged, of which the first 5 account for half 
the total variance. The authors had expected 
factor analysis to reduce in number the origi- 
nal 12 dimensions they had conceptualized, 
but the opposite occurred. They later felt 
these results were meaningful in describing 
the complexity of attitudes involved. They 
believed that earlier work oversimplified € à 
attitudes. For example, the variable of ape 3 
typically regarded as unidimensiona , Was 
found to include several e pesce in- 
cluding accepted restriction and resented re- 
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striction. Each of these in turn was influenced 
by other variables such as dependency or 
psychological mindedness. In short, the study 
suggests that the attitudes involved in mental 
illness are far more complex and interrelated 
than is generally acknowledged. 

In addition to the more widely used ques- 
tionnaires, it seems worth noting Baker and 
Schulberg’ (1967) Community Mental 
Health Ideology Scale (CMHI), which covers 
a point of view largely omitted in the other 
questionnaires that were constructed before 
this dimension became popular. The CMHI 
is designed to measure an individual's degree 
of adherence to community mental health 
ideology. Originally consisting of 88 Likert- 
type items representing five conceptual cate- 
gories, its final form contains 38 items repre- 
senting three concepts that were found to 
characterize this ideological orientation. These 
concepts include focus on the total popula- 
tion rather than just those actively seeking 
psychiatric help, involvement of a variety of 
community resources in working with patients, 
and preventive efforts via environmental in- 
tervention. The concurrent validity and reli- 
ability of the scale are good, and it has been 
used to effectively differentiate professional 
groups in terms of the extent of their endorse- 
ment of this attitude dimension. 

At present, the most widely used instru- 
ment for the measurement of attitudes toward 
mental illness continues to be the OMI. Al- 
though it has been variously criticized as too 
complex (cf. Lawton, 1964a) or incomplete 
(cf. Baker & Schulberg, 1967), it seems to be 
the most comprehensive, reliable, and valid 
instrument now available for the measurement 
of attitudes toward mental illness, and is 
accordingly most popular among investigators 
in this field. 


Public Attitudes toward Mental Illness 


The stigma of the label “mental illness” is 
becoming widely acknowledged and docu- 
mented. Within a psychiatric hospital, where 
many patients are sent against their will, in- 
mates seldom share the rights, liberties, and 
satisfactions that civilians enjoy, and on their 
return home they often find that being an 
ex-mental patient is more of a liability than 
being an ex-criminal in the pursuit of housing, 
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jobs, and friends. A variety of recent studies 
have illustrated the generally negative and 
rejecting attitudes of most Americans regard- 
ing mental illness and the mentally ill. Al- 
though dissenting views and findings have 
been reported, they form a minority opinion. 

A classic experimental study of opinions 
about mental illness was conducted by Cum- 
ming and Cumming (1957) in a small Cana- 
dian town. The investigators tested residents 
before and after a 6-month educational cam- 
paign designed to promote more accepting 
attitudes toward mental illness. They stressed 
three propositions in their films and group 
discussions: first, that the range of normal 
behavior is wider than often believed; second, 
that deviant behavior is not random but has 
a cause and thus can be understood and modi- 
fied; and third, that normal and abnormal 
behavior fall within a single continuum and 
are not qualitatively distinct. The townspeo- 
ple readily accepted the first two propositions 
and in fact went beyond psychiatrists in the 
range of behavior regarded as normal. But 
the third proposition was so unpalatable that 
the community eventually rejected the entire 
educational program. As summarized by Sus- 
ser and Watson (1962), these results were 
taken to indicate that the sample feared men- 
tal illness and tried to ignore its manifesta- 
tions as far as possible: thus, Proposition 1 
was compatible with their outlook. When an 
individual's behavior became too deviant to 
overlook, the community wanted him to be 
Segregated through hospitalization; to some 
extent, acceptance of the second proposition 
provided justification for such action since 
hospitalization could be regarded as in the 
interest of the patient as well as the com- 
munity. The third proposition was disturbing 
because it suggested that anyone could become 
Insane under certain circumstances; this idea 
conflicted with the predominant values of the 
people of this community and was rejected in 
favor of maintaining the latter. In short, the 
Cummings’ study demonstrated the initially 
negative attitudes toward mental illness of a 
middle-class community, their relationship to 
a more extensive system of values, and the 
unfeasibility of modifying a specific attitude 
In isolation from this system. 


Most investigations of public attitudes 
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toward mental illness have been based on a 
survey rather than experimental model. An 
early and simple survey was carried out in 
1947 by Ramsey and Seipp (1948a, 1948b). 
Selected adults in Trenton, New Jersey, were 
asked six questions meant to elicit their no- 
tions about the etiology and treatment of 
mental illness. The authors reported that re- 
spondents with higher educational and occu- 
pational levels were less apt to view mental 
illness as punishment for sin or the outcome 
of poor living conditions, were less inclined to 
believe in the deleterious effects of associating 
with the mentally ill, and were more optimistic 
about the possibility of recovery. 

Far more sophisticated and extensive was 
a 6-year survey conducted during the 1950s 
by Nunnally (1961). Among his voluminous 
findings, selected observations from his 1954 
data report that “as is commonly suspected, 
the mentally ill are regarded with fear, dis- 
trust and dislike by the general public [Nun- 
nally, 1961, p. 46|.” The stigma associated 
with mental illness was found to be very gen- 
eral, both across social groups and across atti- 
tude indicators, with little relation to demo- 
graphic variables such as age and education. 


Old people and young people, highly educated people 
and people with little formal training—all tend to 
regard the mentally ill as relatively dangerous, dirty, 
unpredictable and worthless [Nunnally, 1961, p. 51]. 


A strong negative halo surrounds the mentally 
ill—"they are considered, unselectively, as 
being all things bad [Nunnally, 1961, p. 
233]." Like Nunnally, Freeman and Kasse- 
baum (1960) found no evidence that attitudes 
about mental illness are related to educa- 
tional level, in their study of over 400 adults 
representing the general public in the state of 
Washington. Furthermore, knowledge of the 
technical vocabulary of psychiatry was re- 
lated only weakly to such attitudes. The au- 
thors suggested that those in charge of health 
education programs be cautious in thinking 
that facts necessarily alter peoples’ opinions. 

In contrast to Nunnally (1961) and Free- 
man and Kassebaum (1960), Hollingshead 
and Redlich (1958) found distinct and dra- 
matic differences in attitudes and knowledge 
about mental illness and mental patients as 
a function of social class and education. In 
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their studies, where attitudes were inferred 
Írom observed behavior in psychiatric treat- 
ment situations, they found that upper-class 
members have more favorable attitudes to- 
ward psychiatrists, have clearer conceptions 
of their role, are better informed about mental 
illness, and are more accepting of mental pa- 
' tients than those in the lower classes. 

The general public tends to reject disturbed 
behavior that is socially visible, even if it is 
not severe in terms of its incapacitating ef- 
fects on the patient. In our culture, it is less 
socially acceptable to behave in a disruptive, 
bizarre, or troublesome fashion than to act 
withdrawn, detached, or depressed. Thus a 
paranoid schizophrenic is more often identi- 
fied as mentally ill than a simple schizo- 
Phrenic, an acting-out child more often than 
a withdrawn one. Lemkau and Crocetti 
(1962), using three Star abstracts, found that 
91% of their urban sample identified the 
paranoid as mentally ill, 78% identified the 
schizophrenic as mentally ill, and 62° identi- 
fied the alcoholic as mentally ill. Manis, 
Hunt, Brawerm, and Kercher (1965) 
found that contrary to their predictions, 
Psychiatrists as well as the general pub- 
lic were more influenced by the social 
visibility than the severity of symptoms 
in deciding whom to label mentally ill, based 
On a set of 20 descriptive pargraphs. The au- 
thors suggested that the cultural content of 
these descriptions, regarding degree of social 
conformity, served as the primary determinant 
among the samples they studied. On the other 
hand, Dohrenwend and Chin-Shong (1967) 
found that mental health experts were indeed 
More sensitive than the public to the severity 
of withdrawn behavior as well as antisocial 
behavior, which was what lower-class samples 
tended to regard as most pathological. Phil- 
lips (1963, 1964), studying a sample of white, 
married women, found that they also were 
More apt to regard disruptive behavior as 
disturbed. In addition, he found that people 
who sought psychiatric help were more 
strongly rejected by the normal sample than 
were those who consulted clergy or medical 
personnel, as measured by a scale of social 
distance. Yamamoto and Dizney (1967) rep- 
licated Phillips’ study, using a sample of 
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students teachers in a midwestern university, 
and obtained comparable results. 

As noted above, Nunnally’s findings lead to 
some question of the widespread assumption 
that lower-class population groups are gen- 
erally more tolerant of disturbed behavior 
than are middle- and upper-class groups. 
Dohrenwend and Chin-Shong (1967) directly 
rebutted this assumption in their study of 
public attitudes toward deviant behavior in a 
New York City sample. Lower-class respond- 
ents were more apt to ignore the pathology of 
withdrawn behavior and regarded antisocial 
behavior as being serious but not mentally 
ill. Once they decided that an individual was 
indeed mentally ill, they were more rejecting 
than were respondents with higher socioeco- 
nomic status. As these authors noted: 


lower-status groups are predisposed to greater in- 
tolerance of the kinds of deviance that both they and 
higher-status groups define as serious mental illness. 
Their definition of serious mental illness is narrower 
than that of higher-status groups, giving the appear- 
ance of greater tolerance of deviance from the van- 
tage point of the higher-status groups, including the 
mental health professions [Dohrenwend & Chin-Song, 
1967, p. 432]. 


Attitudes of Mental Health Personnel 


Public attitudes toward mental illness are 
of some interest to those concerned with the 
origins and maintenance of psychopathologi- 
cal behavior, and are clearly relevant to work- 
ers involved in primary prevention programs. 
Clinicians and administrators responsible for 
treating patients (secondary prevention) are, 
however, more specifically concerned with the 
attitudes about mental illness and mental pa- 
tients held by mental health personnel—psy- 
chiatrists, psychologists, social workers, nurses, 
and aides—since the impact of these attitudes 
is increasingly recognized as integral to the 
experiences and careers of the patients who 
are exposed to them. 

Most studies of the attitudes held by men- 
tal health workers have considered employee 
subgroups separately. Investigators have typi- 
cally reported that personnel with lower 
status are more authoritarian and restrictive 
in their attitudes toward mental patients 
while those with advanced professional train. 
ing—psychiatrists, psychologists, and social 
workers—show more awareness of the 


- 
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strengths patients possess, are more liberal 
and tolerant in their attitudes, and are more 
optimistic about their prospects for recovery. 

Cohen and Struening (1962, 1964, 1965) 
have conducted what is perhaps the most ex- 
tensive series of studies concerning attitudes 
of mental health workers toward mental ill- 
ness. Working within the Veterans Adminis- 
tration Psychiatric Hospital system, they 
studied members of 19 occupational categories 
and grouped them empirically into four clus- 
ters in terms of their attitudes as elicited by 
the OMI questionnaire. White-collar work- 
ers, including technicians, nurses, dentists, 
and nonpsychiatric physicians, scored low on 
Authoritarianism. Blue-collar workers, in- 
cluding aides, maintenance and kitchen work- 
ers, were typically Authoritarian and Socially 
Restrictive. Their attitudes were not Benevo- 
lent, and they did not advocate Mental Hy- 
giene Ideology or Interpersonal Etiology. The 
third cluster, consisting of psychologists and 
social workers, endorsed attitudes that were 
the converse of blue-collar workers—low on 
Authoritarianism and Social Restrictiveness, 
high on Mental Hygiene Ideology and Inter- 
personal Etiology. Clergymen, constituting 
the fourth group, showed patterns similar to 
but less extreme than Group 3. Psychiatrists 
did not fit into any cluster, but resembled 
clergymen more than the other groups. 

Cohen and Struening (1965) also reported 
that the overall atmosphere of a given hospital 
is largely determined by the attitudes of 
nurses and aides and that authoritarian 
restrictive atmospheres were negatively cor- 
related with discharge rates (Cohen & Struen- 
ing, 1964). The extent of Authoritarian and 
Benevolent attitudes wa: 
hospitals in different 
for nurses and aides, 
the local subculture, bu 
fessionals, 

Findings consonant with those of Cohen 
and Struening have been reported by Appleby 
Ellis, Rogers, and Zimmerman (1961), Rezni. 
koff (1963), Reznikoff et al. (1964), Wright 
mid Klein (1966), Williams ard Wil- 
liams a and Vernallis and St. Pierre 

: Appleby et al., using the CMT 
OMI, and a Q sort to measure role con- 
» found that Professional staff mem- 


S found to vary across 
geographical locations 
probably according to 
t did not vary for pro- 
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bers differed from aides and administrative 
personnel in being less authoritarian and more 
humanistic. Reznikoif investigated attitudes 
toward psychiatrists, psychiatric hospitals, 
and psychiatric treatment held by nurses and 
aides, using a 12-item Multiple Choice Atti- 
tudes Questionnaire he constructed. He found 
nurses generally more favorable in their atti- 
tudes toward psychiatrists and psychotherapy 
than aides, and supervisory personnel more 
favorable than others within each group. He 
was subsequently unable to replicate these 
findings, however. Wright and Klein (1966) 
found professional staff more accepting than 
aides and other employees with less education 
and formal training, while hospital personnel 
as a group were more accepting than members 
of the adjacent community, a small southern 
town. Williams and Williams (1961) found 
that student nurses were less Authoritarian 
and scored lower on anomie than aides, aS 
measured by two scales they constructed. Ins 
student nurses were also more “modern m 
their attitudes about the symptoms and stigma 
of mental illness. 

Attitudes of volunteer workers at Veterans 
Administration Hospital in Topeka, Kamsan 
were compared with those of other hospita 
employees by Vernallis and St. Pierre (1964): 
Their responses were most similar to those 9 
male aides: they were not receptive to Me 
Hygiene Ideology and were more Socia y 
Restrictive than white-collar and profession 
Workers, although they believed, like most e 
the staff, that an Interpersonal Etiology 
largely accounts for disturbed behavior. dis- 

In short, the available evidence shows E 
tinct attitudinal patterns for different C4 E 
gories of mental health workers. Since ma 
categories differ in terms of demograp" 
variables such as age, sex, and education 
well as job function, the variations in nl 
cannot be attributed solely to occupatio 
differences. In fact, as several studies sugge s 
these attitudes seem largely shaped by ag 
education, and. social class, just as aoe 
Occupation is often largely dictated by 
variables, ej 
Middleton (1953), using a 47-item P gre 
ce Test with a dichotomous agree- dis? wet 
rmat, found that better educated, youre d 
SHD dug intelligent hospita] employees 


di 
fo 


m] 


Tbe 


he 
we 


OPINIONS ABOUT MENTAL ILLNESS 


less prejudiced than others. This matches the 
more general finding that age and extent of 
prejudice are positively related. He also re- 
ported that the more experienced workers 
Were more prejudiced, which logically follows 
Since more experienced Workers are typically 
older. Like Middleton, Lawton (1964b, 1965), 
using the OMI, found that Authoritarianism 
and Social Restrictiveness were positively re- 


lated to age and years of service. He also 
found a negative correlation between Social 


Restrictiveness and education. Similarly, 
Clark and Binks (1966) reported that greater 
education and younger age were associated 
with more liberal attitudes about mental ill- 
Dess. In contrast, Reznikoff (1963) found low 
but significant relationships between positive- 
ness of overall attitudes and years of experi- 
ence for both nurses and aides. Weaknesses in 
the statistical design of his study cast some 
doubt on the validity of his results, however, 
Cohen and Struening (1962) did not find age 
highly correlated with any of their factors on 
the OMI, but education was negatively corre- 
lated with Authoritarianism and Social Re- 
Strictiveness, and positively correlated with 
Mental Hygiene Ideology and Interpersonal 
Etiology factors. 

Baker and Schulberg’s (1967) CMHI 
Scale, a single-factor measure of community 
mental health beliefs, was used to rank se- 
lected professional groups in terms of their 
adherence to this attitude dimension. The 
psychologists included in this study (postdoc- 
loral students in community mental health 
ànd members of the American Psychological 
Association's Division 12— Clinical Psychol- 
°gy) obtained the highest scores. Occupa- 
tional therapists came next, followed by a 
Tandom sample of members of the American 
Psychiatric Association. The lowest scores, 
indicating least acceptance of community 
mental health beliefs, were those of a random 
Sample of the American Psychoanalytic Asso- 
Ciation, The authors found several correlates 
to this attitude dimension, such as age and 
Occupation. Those who scored high were apt 
to be younger and to have received their train- 
ing more recently. They spent more of their 
time in administration, teaching, and commu- 
nity consultation and were relatively less in- 
Volved in direct patient treatment. They 
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tended to work in universities, community 
clinics, hospitals, and schools rather than in 
private practice. 

Ehrlich and Sabshin (1964) differentiated 
three rather independent ideological orienta- 
tions among psychiatrists: psychotherapeutic, 
somatotherapeutic, and Sociotherapeutic. The 
psychotherapeutic position espouses the prin- 
ciples of the mental hygiene movement, ac- 
cepts the medical model of mental illness, and 
is endorsed by more psychoanalytic and dy- 
namically oriented psychiatrists. The somato- 
therapeutic point of view is comparable to 
that of the directive organic, advocating 
chemical and physiological etiological expla- 
nations and therapeutic strategies. The socio- 
therapeutic position is largely concerned with 
the network of people, places, and things con- 
stituting the ecology in which patients live; 
its adherents devote their therapeutic efforts 
to family and environment rather than the 
inner mental mechanisms analyzed by advo- 
cates of the psychotherapeutic position. It is 
probably endorsed by the same people who 
would obtain high scores on Baker and Schul- 
berg’s (1967) CMHI Scale. The instrument 
used to identify these three ideologies was a 
28-page questionnaire of opinion statements 
in Likert format, covering the areas of eti- 
ology, nature of the therapeutic process, ap- 
propriateness of different therapeutic proce- 
dures for different diagnostic groups, and be- 
liefs about the therapeutic competencies of 
various mental health specialists. Embedded 
in this questionnaire were three scales Specifi- 
cally designed to measure commitment to 
each of the three postulated orientations, As 
Sabshin (1969) pointed out, this question- 
naire dealt predominantly with issues related 
to psychiatric hospitalization because it Was 
written in the later 1950s when there Was 
great interest in milieu therapy. A follow-up 
study now underway includes items regarding 
activities in a community context, which 
hardly existed 10 years ago. 

Although most studies have focused on sep- 
arate occupational groups in studies of atti- 
tudes about mental illness, some investigators 
have dealt with general staff beliefs about 
mental patients. In one of the earlier studies 
of its kind, Myers and Schaffer (1954) re- 
ported more positive staff attitudes toward 
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upper-class patients, an observation that was 
also reported by Belknap (1956). Mendel 
and Rapport (1969), studying determinants 
of the decision to hospitalize patients appear- 
ing in psychiatric admitting offices, found that 
the professional staff members responsible for 
such decisions all believed that symptom se- 
verity was a major consideration in determin- 
ing which patients to hospitalize and that a 
history of prior hospitalization was largely 
irrelevant. Contrary to these stated beliefs, 
the authors found that patients who were 
actually hospitalized were indistinguishable 
from those who were not, on the basis of 
symptom severity, but far more had a prior 
history of hospitalization. It was also found 
that social workers hospitalized fewer pa- 
tients than did psychiatrists or psychologists 
and that clinicians with less than 6 months of 
Clinical experience hospitalized more patients 
than did those with more than 3 years! experi- 
ence. While this study does not explore atti- 
tudes about mental illness directly, these can 
be inferred from the observed clinical behavior 
noted above. 

In summary, attitudes about mental illness 
vary markedly between different categories of 
mental health workers and are related to the 
demographic variables of age and education, 


which was not found consistently true for the 
general public. 


Attitudes of Patients and Their Relatives 


In addition to attitudinal studies of the 
general public and of mental health workers, 
attention has been devoted to the attitudes of 
mental patients and their families toward 
mental illness and related issues. (As used 
here, the term mental patient refers only to 
the hospitalized.) Overall findings suggest 
that mental patients? attitudes are like those 
of nonpatients of comparable age, 
and social class, and that the co 
patienthood does not significantly 
beliefs and judgments. Thus Giov 
Ullman (1963), studying Veterans Adminis. 
tration. psychiatric patients, reported that 
they were no better informed about menta] 
health and illness than the general public, and 
their attitudes toward the mentally ill were 
as highly negative as those of normals. Crump- 
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ton, Weinstein, Acker, and Annis (1967) re- 
ported that while patients gave more favor- 
able ratings than normals of concepts such 
as mental patient and sick person, on se- 
mantic differential scales, both groups saw the 
mental patient in unfavorable terms. 

Manis, Houts, and Blake (1963) compared 
hospitalized psychiatric patients’ attitudes 
toward mental illness, as measured by a scale 
like that of Nunnally’s, with those of medi- 
cal patients and mental health professionals. 
They found no significant attitudinal differ- 
ences between medical and psychiatric pa- 
tients. Those with more education believed 
that mental patients were like normals IN 
appearance, and that mental illness is curable. 

Bentinck (1967) compared the OMI fe- 
sponses of 50 hospitalized Veterans Eod 
tration schizophrenics and their relatives, 3 
medical patients and their relatives, and "i 
erans Administration hospital personnel. es 
found that the schizophrenics were less Benev- 
olent and Socially Restrictive in their att! 
tudes than either their relatives or hospita 
personnel. The attitudes of schizophrenics 
relatives resembled those of blue-collar hos 
pital personnel (as defined by Cohen xti 
Struening) rather than those of mental hea : 
professionals, and they also tended to gone 
from the same social backgrounds as the oar 
collar hospital workers; both of these gronh 
were more pessimistic about treatment oa 
come, more restrictive, and more authoritan? 
than were the therapists who treated the P* 
tients. . 

Attitudes of relatives of mental pen 
were also studied by Freeman (1961) W 
used a standardized interview schedule ks 
mothers and spouses. He found that be en- 
educated relatives tended to hold more j 
lightened attitudes about mental illness és 
did younger relatives, but that social € 
was not a significant factor. Freeman in 
preted the correlation between education rbal 
attitudes as a reflection of differential ve , 
ability rather than differences in social a 
He also found that relatives’ attitudes i 
not influenced by duration of hospitaliz? ne 
number of hospitalizations, or diagnosis 
patient except on the question of TeC i 
The patient's behavior after release /79! 
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hospital did influence their families! attitudes 
about the chances for complete recovery and 
the extent of the patient's responsibility for 
his behavior. 

Hollingshead and Redlich (1958), in con- 
trast to Freeman, did find striking social class 
differences in relatives’ attitudes about mental 
illness and their mentally ill members. As a 
rule, the authors observed, the lower the class, 
the greater the feelings of fear and resent- 
ment; the higher the class, the more pro- 
nounced the feelings of shame and guilt. Dur- 
ing treatment or hospitalization, resentment 
in the lower-class families was replaced by 
feelings of helplessness and apathy. In the 
three upper classes, such feelings were less 
marked, and interest in the sick member was 
Stronger. The authors saw a connection be- 
tween these different attitudes and the class 
differences in proportion of hospitalized pa- 
tients. They believed that, to a significant 
degree, the attitude of the family toward its 
Sick member is responsible for the determina- 
tion of who goes to the hospital, who improves 
there, and who deteriorates and ends up on 
à chronic ward. These findings of the signifi- 
cance of social class in attitudes about mental 
illness are not reconcilable with those of 
Freeman, but may be at least partially attrib- 
utable to differences in sample composition or 
definition of social class. 

The Souelem Scale has been used in several 
Studies to gauge attitudes toward psychiatric 
hospitals. Imre (1962) found hospital person- 
nel and volunteers more favorably disposed 
toward hospitals than were the patients. Imre 
and Wolf (1962) replicated these results and 
also noted that student nurses shared the pa- 
tients’ dim view of mental hospitals. Toomey, 
Reznikoff, Brady, and Schumann (1961) com- 
pared attitudes of patients and student nurses. 
They found that patients’ attitudes toward 
Psychiatrists, psychiatric treatment, and out- 
come became more positive with hospitaliza- 
tion, but their attitudes toward hospitals re- 
mained the same. In contrast, student nurses' 
attitudes toward hospitals improved, but other 
attitudes did not. Souelem (1955) reported 
that most of the Veterans Administration psy- 
chiatric patients he studied with his scale 
held generally favorable attitudes toward hos- 


pitals, although they were less enthusiastic 
when responding to  unstructured tests. 
Brady, Zeller, and Reznikoff (1959) found 
that favorableness of attitude toward hos- 
pitals was correlated with successful treat- 
ment outcome. 

In summary, mental patients are as nega- 
tive in their opinions about mental illness and 
the "insane" as the general public. They ap- 
preciate the value of psychiatric hospitals less 
than hospital staff. Their beliefs about the 
nature of mental illness and proper manage- 
ment of mental patients resemble those of 
nonpatients of similar social and educational 
background. Since most hospitalized patients 
are of lower socioeconomic status than most 
mental health professionals, they are rela- 
tively more conservative, less tolerant, and 
more restrictive in their attitudes. 


Attitude Change as a Function of Classroom 
Training and Practical Experience 


While the delineation of attitudes toward 
mental illness can be of interest as an end in 
itself, it is most commonly undertaken in re- 
lation to efforts at attitude modification. That 
is, hospital and clinic administrators, super- 
visors, and instructors are eager to encourage 
more favorable attitudes among staff and stu- 
dents. They want to be able to ascertain atti- 
tudes held when workers first join the staff, in 
order to modify those which do not fit into 
the institution's prevailing beliefs and pro- 
cedures. Studies of attitude change through 
experience fall into two general categories: 
those observing the effects on attitudes of 
personal experience with mental hospitals and 
mental patients, and those measuring the im- 
pact of in-service training programs for staff 
members. Most of the following studies re- 
port changes in responses to questionnaires: 
few attempt to measure changes in overt be- 
havior. 

Studies of the effect on attitudes of direct 
contact with patients have generally focused 
on student nurses, although college student 
volunteer programs have also been surveyed. 
Gelfand and Ullman (1961a) compared the 
attitudes of student „nurses assigned to a 
psychiatric program with the attitudes of stu- 
dent nurses in nonpsychiatric programs. Both 
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groups were given the OMI before and aíter 
the experimental group had their psychiatric 
affiliation. This group became significantly 
less Socially Restrictive and less Authoritarian 
in their questionnaire responses, in comparison 
to the control group. In a similarly designed 
study by Lewis and Cleveland (1966), Au- 
thoritarianism was again found to be lowered 
after psychiatric experience with patients, but 
changes in scores on the other OMI factors 
were not significant. 

Hicks and Spaner (1962) studied more 
than 400 student nurses. Like Gelfand and 
Ullman (1961a), and Lewis and Cleveland 
(1966), they found that attitudes toward the 
mentally ill improved more after 12 weeks 
of psychiatric training, including classroom 
instruction and ward experience, than did 
those of a control group of nurses assigned to 
other medical areas. 

Johannsen, Redel, and Engel (1964) used 
the OMI, the CMI, and the California Psy- 
chological Inventory in their study of student 
nurses’ attitudes. As in other studies, the ex- 
perimental group consisted of nurses in psychi- 
atric affiliation programs, while the control 
group did not have this experience. The re- 
sults in a pre-post design showed that the rat- 
ings of members of the experimental group in- 
creased in the following areas: social presence, 
tolerance, psychological mindedness, and flexi- 
bility. On the CMI, experimental subjects 
showed a liberal attitude even before psychi- 
atric affiliation. Nevertheless, there was a 
shift toward further liberality for the experi- 
mental group and toward custodialism for the 
controls. On the OMI, the experimental sub- 
Jects became less Authoritarian, while the con- 
trol group's scores increased. The authors 
interpreted the changes as “an accentuation 
of Preexisting tendencies and beliefs” during 
Psychiatric affiliation, They commented also 
that the expected change in Mental Hygiene 
Ideology scores did not occur for either group. 

Canter and Shoemaker (1960) found that 
Reese eer af em m 
tient, "Those with high se : k hate we 
s igh scores on Authoritarian- 
ism, as measured by the California F Scale 
and the Rosenzweig Picture Frustration Test 
Changed this Stereotype after training fore 
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than did students with lower scores on Au- 
thoritarianism. Holtzberg and Gewirtz (1963) 
and Ralph (1968) found similar results using 
college students rather than nursing students. 
Holtzberg and Gewirtz compared undergradu- 
ates who volunteered to participate in a “com- 
panion” program at a nearby mental hospital 
with undergraduates involved in other social 
service activities in the community, such as 
the YMCA. The attitudes of “companions 
shifted significantly in a positive direction 
while those of the other students did not. 
Ralph obtained the same outcome comparing 
attitudes of students volunteering for a “com- 
panion therapy" program and those volun- 
teering for the traditional recreation pro- 
gram with mental patients. i 
Holmes,” observing staff and members 0 
community recreational centers, studied r^ 
impact, on attitudes about mental illness, O° 
exposure to psychiatric patients from nearby 
hospitals who participated in some social er 
reational activities of the centers. Using OM 
and behavioral measures (e.g., attendance rec 
ords, complaints about the presence of Pa 
tients, frequency of interactions between pa- 
tients), he compared attitudes before a” 
after the introduction of this kind of pro 
gram within a given center, and also attitudes 
of staff and members at centers with an 
without such a program. Despite an excellent 
design and statistical sophistication seldo™ 
encountered in this area, no consistent Mp 
tude changes were observed either between 9 
within centers, for staff or for community 
members, Although age and socioeconom 
status were found to be strong sources of atti 
tudinal influence among members—older a 
poorer respondents tending to be more = 
thoritarian, more socially restrictive, and d 
benevolent than others—exposure to psyche 
atric patients had negligible effects on E 
tudes or behavioral measures regarding men 
illness. . P. 
It would seem, from Holmes’ negative 7 
sults, that contact with patients alone a 
not have as much effect on attitude chang ce 
contact supplemented by classroom inst 
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lion. In contrast to most other studies, 
Holmes' subjects did not perceive themselves 
as students who would be expected to show 
some change after a planned experience, either 
in the form of classroom or practical training. 

Using a different approach, Stotsky and 
Rhetts (1966, 1967) studied the current atti- 
tudes of nurses in relation to their experiences, 
during student days, with psychiatric patients 
and hospitals. The authors used the OMI to 
Study attitudes of nursing home personnel 
toward the mentally ill. Attitudes of nurses 
working in homes where most ex-mental pa- 
tients were successfully placed were compared 
with those of nurses working in homes where 
Such placements were mostly unsuccessful. 
Attitudes of the latter group were found to 
be more Authoritarian, more Sociall Re- 
Strictive, and less Benevolent, but these dif- 
ferences disappeared when the variable of age 
was controlled. It turned out that nurses in 
the “unsuccessful” homes were significantly 
older and had therefore received their psychi- 
atric training, if any, at a much earlier date. 
Most of them had spent time as students in 
large state mental hospitals that provided 
merely custodial care; these experiences were 
reported to be unpleasant and discouraging. 
In contrast, the younger nurses whose train- 
ing occurred after 1945 reported more favor- 
able experiences with mental patients since 
many had received their training in smaller, 
more progressive mental hospitals where 
treatment and recovery were emphasized. In 
short, Stotsky and Rhetts (1966, 1967) found 
that ex-mental patients were more apt to be 
Successfully placed in nursing homes where 
the nurses were younger and had more positive 
attitudes about mental illness deriving from 
More progressive training in their student 
days. 

Several in-service training programs have 
been designed to change attitudes of hospital 
Personnel, based on the assumption that dif- 
ferences in attitudes make a difference in be- 
havior and effectiveness in working with psy- 
Chiatric patients. Middleton's (1953) work 
suggests that training ought to include a 
thorough indoctrination about etiology, treat- 
ment results, examples of successes, and rea- 
sons for failure, with periodic repetition of 
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this training. In studying attitudes toward 
the mentally retarded, as measured by a 
scale constructed for this purpose, Quay, 
Bartlett, Wrightsman, and Catron (1961) 
used three methods of presenting material to 
a group of attendants. They found that the 
formal lecture method was more effective in 
changing reported attitudes than either the 
discussion method or use of a booklet. A 
training method called the “remotivation 
technique" was used by Long (1963) to pro- 
vide attendants with *a structured yet flexible 
method of helping patients toward reality." 
He expected this technique to increase the 
humanism (on the CMI) of attendants on the 
chronic wards in contrast to those on acute 
wards. He claimed some success, explaining 
that acute patients often seem to recover 
spontaneously and have relapses, while a 
change in a chronic patient is seen as a more 
significant event. 

Gerjouy et al. (1963), on the other hand, 
felt that attendants on the back (chronic) 
wards were the least motivated group since the 
atmosphere is one of little hope and little treat- 
ment. An Attendant Attitude Questionnaire 
was used to measure change during a 6-month 
intensive milieu therapy program including 
meetings with professional staff and special 
demonstrations of new techniques. Although 
there were no changes in the scored attitudes 
of attendants on the back wards after this 
special training, there were obvious positive 
changes in the patients' clinical conditions. 
However, there were positive changes in the 
scored attitudes of attendants on front (acute) 
wards even without the special program. The 
authors felt that their results suggest the value 
of periodic rotation of aides as a means of 
maintaining employee morale. They did not, 
however, explain the changes in patients’ con- 
ditions on the back wards, which may have 
resulted from change in the attendants’ be- 
havior even if there were no measurable 
changes in their questionnaire responses. This 
study challenges the notion of Long (1963) 
that changes in chronic patients are poten- 
tially more satisfying to aides than changes 
in acute patients, unless of course the aides 
on the back wards in Gerjuoy et als study 
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were personally fulfilled without changing 
their responses on the attitude questionnaire. 

Ellsworth (1965, 1968) has experimented 
with training aides and has succeeded in 
bringing about positive attitude change in 
many of them. His program was designed to 
involve aides in decision-making processes on 
the ward and in the steps required to imple- 
ment the decisions. Direct interaction between 
patients and staff was encouraged, to promote 
mutual respect, direct communication, and 
expectations for patients! improvements. It 
subsequently became evident that the leader 
of the program was a crucial component to 
its success; under circumstances where the 
leader was less actively involved, the program 
was less effective. 

In summary, modification of attitudes about 
mental illness has proved feasible for several 
populations including student nurses, hospital 
aides, and other selected occupational groups. 
The critical ingredient in such endeavors 
seems to be some sort of interaction between 
personal confrontation with the mental hos- 
pital and mental patient and a supplementary 
educational program. In the studies noted, 
direct contact with patients typically contrib- 
uted to more tolerant and understanding at- 
titudes about mental illness when reinforced 
with formal instruction. 


Attitude Change as a Function of Academic 
Instruction 


In addition to these studies of attitude 
changes among mental health workers as a 
result of practical experience, often supple- 
mented by classroom training, several investi- 
gators have considered the impact of psychol- 
ogy courses on the attitudes of undergradu- 
ates. Although some positive results have been 
reported, the overall findings suggest that 
changes in attitudes about mental illness after 
taking psychology Courses, when they occur, 
are probably not related to the academic con- 
tent of the courses, but to such factors as the 
teacher’s attitude, or to the nature of the 
students’ abilities or ongoing belief systems. 
Thus, variables “in” the teacher or “in” the 
student, rather than in the course n 
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changes reported after enrollment in psychol- 
ogy classes. . 

“One of the few studies to report effective 
attitude change due largely to the instruction 
received was conducted by Coston and Kerr 
(1962). The OMI was administered before 
and after an abnormal psychology course. 
The authors found that all women regardless 
of class rank, and those men in the upper 
half of their class, became less Authoritarian 
and less Socially Restrictive in their attitudes. 
Both men and women became less Benevolent 4 
and more convinced of Interpersonal Etiology 
as a cause of mental illness, These findings 
were interpreted to suggest that the students 
gained a certain sophistication as a result of 
the course but that the better students were 
more open to favorable change than the oth- 
ers. 

Graham (1968) gave the OMI to students 
in introductory and abnormal psychology 
courses at the start and end of a 10-week 
term. Scores on the Interpersonal Etiology 
scale rose in both classes. In a similarly de 
signed study, Gulo and Fraser (1967) founi 
that Social Restrictiveness scores declined. 

Altrocchi and Eisdorfer (1961), using 
similar design, did not obtain the expecte | 
attitude changes among college students. = Xi 
à subsequent study, they found that Bursine 
students who had contact with patients durins 
their courses did show positive attitu : 
changes. The authors suggested that peop'e 
who are relatively well-informed to begin wi 
are not apt to revise their attitudes solely 9! 
the basis of additional didactic informatio” 
about mental illness, but that personal "n 
volvement with patients, specific training ui j 
dealing with them, or supervision directed A 
self-understanding may be necessary in ad 
tion to or instead of academic instruction. ? 

In a study comparable to the second co? 
ducted by Altrocchi and Eisdorfer, Iguchi pos 
Johnson (1966) also examined attitude ard 
as a function of the joint impact of academ 
coursework and personal experience with wA 
tal patients. Using a pre- and posttest desi£ 
they contrasted the attitudes of students E 
ing an abnormal psychology course with peo 
of students taking the same course in wed 
junction with a volunteer "companion P 
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gram" at a local mental hospital. The authors 
reported that students electing the companion 
program were more humanistic (as measured 
by the CMI) to begin with, but the attitudes 
of this group did not show significantly 
greater change than did those of the controls. 
The addition of patient contact added noth- 
ing beyond that gained by students in class 
alone, in terms of changes in questionnaire 
response. 

Dixon (1967) used the OMI to compare 
attitude changes after completion of psychol- 
ogy courses. He found some favorable attitude 
changes, Subsequent interviews with the in- 
structors led him to conclude that the instruc- 
tors’ attitudes had a greater effect on stu- 
dents’ attitudes than did the content of the 
text used. 

In short, several studies have succeeded in 
demonstrating the effectiveness of academic 
instruction in changing questionnaire-mea- 
sured attitudes about mental illness. In con- 
junction with other factors, such as personal 
experience with mental patients, it seems 
maximally effective but by itself is of value. 
These findings do not mesh with those of 
Cumming and Cumming (1957), Freeman 
and Kassebaum (1960), Casper (1964), and 
others who have reported that imparting in- 
formation about mental illness does not by 
itself alter attitudes of the general public. 


Relation between Attitudes and Behavior 


One of the most germane considerations in 
the study of attitudes about mental illness is 
their relationship to behavior, their impact 
on the effectiveness of those who deal with 
mental patients. Few investigators in this or 
any other area have been able to demonstrate 
a straightforward relation between attitude 
and behavior, or between attitude change and 
behavioral change, although this relationship 
is largely taken for granted by most social 
scientists. 

Attitudes are commonly viewed as pre- 
cursors or determinants of overt behavior. 
Supposedly, the verbal statements that are 
used to assess attitudes and overt acts are 
both mediated by the same underlying varia- 
ble. The effects of situational factors or other 
values and beliefs of the individual on a given 
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sample of behavior are seldom evaluated in 
this context. 

While it is often difficult to find suitable 
overt behavioral measures with which to com- 
pare statements of attitudes, a variety of 
studies of this nature have been undertaken 
and carefully reviewed by Wicker (1969). He 
found in the published literature that atti- 
tudes are typically unrelated or only slightly 
related to actions, that correlations between 
attitudes and overt behavior are rarely above 
.30 and often near zero, and that only rarely 
can as much as 10% of the variance in overt 
behavioral measures be accounted for by 
attitudinal data. Even more unsettling is 
Wicker’s finding that in the studies he re- 
viewed, substantial proportions of subjects 
show striking discrepancies between their 
words and actions. 

Evidently, factors other than statements of 
attitudes play a major role in determining 
behavior. These may be roughly classified as 
personal and situational. Personal factors such 
as other attitudes, competing motives, or so- 
cial and intellectual limitations seem to be 
important largely in their interaction with 
situational factors, which include the presence 
or influence of other people, social norms and 
expectations, and the number of alternative 
behaviors available to someone in a given 
setting. Wicker concluded his review by urg- 
ing that researchers acknowledge the prob- 
lems involved in defining conceptions of atti- 
tudes and their relations to behavior. It seems 
tremendously important that those who see 
attitudes as indexes of overt behavior under- 
take the responsibility for demonstrating this 
relationship—a task that few have attempted 
and even fewer succeeded. 

One of the few investigations in the mental 
health field to demonstrate an unequivocal 
relation between staff attitudes and patient 
discharge patterns was conducted by Cohen 
and Struening (1964). For a sample of 12 
Veterans Administration Psychiatric Hospitals 
whose patient populations consisted largely of 
chronic schizophrenics, it was found that hos- 
pitals characterized by an Authoritarian-Re- 
strictive atmosphere as defined by mean OMT 
scores of representative samples of their em- 
ployees had lower rates of early 


discharge as 
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measured by the converse of length of hos- 
pital stay. Staff attitudes were thus demon- 
strably related to staff decisions regarding 
patients’ length of hospital stay. 

In a study of student nurses, Gelfand and 
Ullman (1961a) considered achievement rat- 
ings, measured by a national test for nurses 
and school grades in theory and in clinical 
practice, in relation to changes in attitudes, 
measured by the OMI, as a function of psy- 
chiatric experience. They found that students? 
questionnaire responses became less Authori- 
tarian and less Socially Restrictive after psy- 
chiatric nursing experience. Students who 
obtained higher grades in theory were less Au- 
thoritarian, but no correlation was found be- 
tween Authoritarianism and practice grades. 
The authors (1961a, 1961b) concluded 
that although attitudes toward Authori- 
tarianism can be modified through prac- 
tical experience with mental patients, 
a change in behavior is not necessarily 
a concomitant of attitude change. Mor- 
ris (1964), using the same design, obtained 
similar results. She also found increased ac- 
ceptance of the concepts related to Interper- 
sonal Etiology but no change in scores on 
Mental Hygiene Ideology. Similarly, Canter 
(1963), who also used a sample of student 
nurses, found no relationship between atti- 
tudes toward mental patients and clinical 
effectiveness, However, he did observe that 
high authoritarianism as measured by the 
California F and Dogmatism scales was asso- 
ciated with lower ratings in clinical effective- 
ness, which Suggests that the correlation be- 
tween performance and endorsed attitudes de- 
pends on what kind of attitude dimensions 
are measured. Both Appleby et a]. (1961) 
and Gerjouy, Rosenberg, Bond, McDevitt 
and Balogh (1963) found that a change of 
procedure on the wards or extensive in-service 
training led to positive changes in patients? 
behavior without measurable changes in staff 
attitudes, as shown on the OMI. Toomey et 
al. (1961) did not find a relation between 


verbally expressed attitudes and Success in 
psychiatric affiliation, 


. Some of the most 
Ing the relation of atti 
Psychiatric hospital 


extensive work concern- 
tudes to behavior among 
personnel has been con. 
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ducted by Ellsworth (1965, 1968) and his 
associates at the Roseburg Veterans Adminis- 
tration Hospital in Oregon. In one of their 
studies, aides and nurses were given two atti- 
tude questionnaires, the OMI and the Staff 
Opinion Survey (SOS). Their behavior was 
also rated on an interpersonal rating scale by 
patients who were carefully screened for their 
ability to perform this task appropriately. The 
relationship between endorsed attitudes of the 
staff and their behavior as perceived by pa- 
tients was thus examined. Three major find- 
ings emerged: staff members who endorsed 
Authoritarian attitudes were seen by patients 
as behaving more often in a controlling, re- 
stricting, and domineering way, in contrast to 
other staff members who rejected this point 
of view. The second finding was that staff 
members who endorsed the orientation of Pro- 
tective Benevolence (which the authors say 
is not the same as the OMI factor of Benevo- 
lence) were seen by patients as behaving 
more often in an aloof, distant, and cold 
manner. Finally, it was discovered that many 
of the same behaviors were attributed to staf 
members endorsing either of these attitude Bm 
mains. That is, staff members who endorsec 
either restrictiveness or protective benevo- 
lence were described as showing a lack of 
respect in their behavior toward patients. 
Further analysis led to the delineation of 4 
third attitude dimension, which largely 
accounted for this overlap. This is named Non- 
traditionalism, and its positive pole is nega 
tively correlated with the OMI factor of At 
thoritarianism. The Nontraditional staff mem- 
ber believes that the patient is not the BUE 
victim of forces beyond his control, but e 
he is able to change and that interaction e 
him pays off. Ellsworth (1965) pointed ou 

that these three attitude dimensions—AU- 
thoritarianism, Protective Benevolence, an 

Nontraditionalism—seem to represent psycho- 
metrically cohesive ways of viewing menta 
illness, but they do not necessarily make 2 
difference in the individual's behavior © 
occupational effectiveness, The author Gp" 
cluded, as did Canter (1963), that the yela- 
tion between endorsed attitudes and effective 
ness in patient rehabilitation depends on ro 
attitudes are being measured, the kind of e 
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mands of the treatment situation itself (e.g., 
the prevailing treatment philosophy on a 
given ward), and the kind of patient being 
treated. Endorsement of one set of attitude 
statements rather than another is not neces- 
sarily indicative of superior talent in dealing 
with psychiatric patients; what matters is the 
way the individual's attitudes relate to the 
overall system in which he functions. 

In short, the relation between attitudes, as 
measured by questionnaires, and behavior, as 
rated by supervisors or patients, is far more 
difficult to establish than might have been ex- 
pected in terms of the widespread belief that 
attitudes directly affect behavior. 


Summary and Suggestions jor Further 
Research 


In summary, a variety of studies have been 
conducted in order to delineate attitudes to- 
ward mental illness, their amenability to 
change, and their relationship to behavior. 
Investigators have succeeded in describing 
the attitudes held by workers in the mental 
health field as well as those of the general 
public. Attitude change has been documented 
fairly well, but the relationship between atti- 
tudes and behavior requires further explora- 
tion. At the present, we may simply conclude 
that there seems to be no one-to-one relation 
between attitude and therapeutic effective- 
ness, but that certain attitudes may be effec- 
tive in certain treatment milieus with certain 
patients, This is an area that needs further 
clarification. 

A major problem faced by workers in this 
area today is the absence of a measuring in- 
Strument of sufficient scope to encompass both 
traditional and contemporary psychiatric 
ideologies. The attitude scales which have 
been most widely used—the CMI and OMI— 
were not designed to incorporate a social psy- 
Chiatric dimension, since this point of view 
was not commonly held in the 1950s when the 
scales were designed. Baker and Schulberg's 
Community Mental Health Ideology Scale 
does cover this attitude dimension. Tt is, how- 
ever, a single-factor measure of social psy- 
chiatric beliefs and simply shows whether or 
not respondents endorse this point of view. 
The Nontraditionalism dimension derived 


from OMI and SOS items that Ellsworth de- 
fined seems compatible with the beliefs of 
social psychiatry and suggests that the OMI 
can effectively be extended to include this di- 
mension. 
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After a brief description of the rationale, construction, and structure of the 
Holtzman Inkblot Technique, this study critically reviews 10 years (1959- 
1969) of research with this instrument. An impressive body of positive findings 
is in evidence, but many more studies are needed in most areas covered by 
this review. The various technical refinements and extensions of this instrument 
that have been produced should facilitate the appearance of much needed 
future research. The dearth of relevant studies renders an evaluative comparison 
with the Rorschach impossible at this time. Implications of certain issues raised 
by R. Schafer in 1948, M. D. Ainsworth in 1954, and R. Holt in 1968 in 
relation to Rorschach validity studies are discussed in reference to research 


with the Holtzman technique. 


Although Holtzman recently provided sum- 
maries of research on the Holtzman Inkblot 
Technique (Holtzman, 1966, 1968), his pre- 
sentations were not evaluative, not much in- 
formation was given concerning the designs 
of the studies reviewed, and a number of 
studies were not included. The present review 
(a) presents a brief description of the ration- 
ale, construction, and structure of this instru- 
ment; (5) provides an evaluative review of 
the literature from 1959 to 1969, inclusive; 
and (c) discusses the general character of 
this research in light of previous criticisms of 
Rorschach validity studies, It should be noted 
that this review attempts to be primarily in- 
formative and secondarily evaluative. It is 
hoped that the inclusion of more than the 
usual amount of descriptive material with re- 
Spect to specific studies will facilitate the 
future appearance of much needed additional 
research that is indicated in several areas 
Covered by this review. 

The authors of the Holtzman Inkblot Tech- 
nique (HIT) have described their goal as one 
of developing “a new inkblot technique having 
Scores of demonstrated Psychometric value 
while still preserving the rich, qualitative 
essence of the Rorschach [Holtzman, Thorpe, 
Swartz, & Herron, 1961, p. 7]? Te 35 pos- 
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sible to think of the HIT as a natural suc- 
cessor to the important position occupied by 
its well-known predecessor, the Rorschach. 
This is not to suggest that deposition of the 
Rorschach is imminent or even highly likely 
at this time, Considering, however, that the 
HIT is another inkblot technique constructed 
along lines established in the Rorschach tradi- 
tion, and further, that the newer instrument 
claims psychometric superiority over the 
Rorschach, the eventual replacement of the 
Rorschach by the HIT seems to be more than 
à remote logical possibility, In view of this 
and the fact that the Rorschach is probably 
the most frequently used psychological test in 
clinics and hospitals, a review of research 
conducted with the HIT would seem to be of 
Some importance at this point. 

To fully appreciate the significance of the 
HIT, it is necessary to keep in mind the well- 
documented deficiencies of the instrument that 
it is most likely to replace, the Rorschach. 
Probably in their most exaggerated form, 
these deficiencies have been presented by 
Zubin (1954) and quoted in Holtzman et al. 
(1961) as follows: 


(1) failure to provide an objective scoring system 
Íree of arbitrary conventions and showing high 
inter-scorer agreement; (2) lack of satisfactory in- 
ternal consistency or test-retest reliability; (3) fail- 
ure to provide cogent evidence for Clinical validity ; 
(4) failure of the individual Rorschach scoring cate- 
gories to relate to diagnosis; (5) lack of prognostic 
or Predictive validity with respect to outcome of 
treatment or later behavior; (6) inability to differ- 
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entiate between groups of normal subjects; and (7) 
failure to find any significant relationships between 
Rorschach scores and intelligence or creative ability 
Ip. 5]. 


Without the qualification that this list 
represents an exaggerated view, one would 
find it difficult to reconcile the fact that the 
Rorschach is the most frequently used instru- 
ment in clinical practice today with the as- 
sumption that at least a majority of clinicians 
would be reality oriented enough to recognize 
its deficiencies and abandon such a faulty 
instrument! Perhaps some of the cognitive 
dissonance generated by the consideration of 
these facts and assumptions can be reduced if 
We recognize with Holt (1968), for example, 
that maybe most of the validity studies re- 
Ported have missed the whole point of 
Rorschach usage and the interpretations it 
Benerates. Maybe there is no effective means 
of quantifying the clinical insight that the 
"master testers" seem to have. Still, and with- 
Out rejecting the values implied in Holt's 
analysis of research with projective tech- 
niques, one can welcome an effort to improve 
Psychometrically a method (such as the 
Rorschach) even if this endeavor may lead, 
in the final analysis, to a new method bearing 
little resemblance to the “wisdom” from which 
it sprang. 


STRUCTURE AND STANDARDIZATION 
or THE HIT 


The HIT consists of two “parallel” sets, 
each containing 45 inkblots that were selected 
and matched on the basis of item analyses of 
hundreds of experimental blots. An important 
Consideration in the selection of a given blot 
Was its ability to elicit small detail, space, 
color, and shading determined responses from 
a subject. Attention to these variables ex- 
Dlicitly places the HIT squarely within the 
tradition begun by Rorschach, in which it is 
assumed that percepts organized on the basis 
of different determinants may be taken to re- 
flect inter- and intraindividual variations in 
Personality organization. Another reason for 
Selecting blots with strong "pulling power? 
with respect to the several locations and 
determinants relates to one alleged point of 
psychometric superiority of the HIT over the 
Rorschach: the requirement of the HIT that 


173 


only one response be given to each card. The 
latter provision was made to counteract the 
confounding of response frequency with the 
frequency and variety of various locations and 
determinants in a subject's record. Since only 
one response is allowed on the HIT, and, 
since experience has shown that form-deter- 
mined wholes are the usual first responses to 
inkblots, special attention to “pulling power" 
with respect to other inkblot variables may be 
viewed as a necessary compensatory measure. 

The final selection of a blot for inclusion 
in the final two forms of the HIT (A and B) 
was based on three empirical criteria: (a) Its 
ability to discriminate between a group of 
collge student volunteers and a group of 
hospitalized psychotics; (b) the amount that 
a blot contributed to the total scores on sev- 
eral variables (such as location, color, shad- 
ing, movement) in each sample; and (c) 
inter- and intrascorer reliability. The results 
of the initial standardization study were most 
encouraging. Realizing the gross (or just sim- 
ply inadequate) nature of the distinction be- 
tween college students and hospitalized psy- 
chotics, it is still impressive that, on all 
variables measured, the groups were clearly 
differentiated and in a manner that is highly 
consistent with previous Rorschach findings: 
Scores for the college group were significantly 
higher on Location, Form Appropriateness, 
Form Definiteness, Movement, and Shading. 
No significant differences were found on mean 
Color scores, but greater variance was ob- 
served in the psychotic group. Split-half re- 
liability coefficients were found to range from 
a low of .31 to a high of .96 with most 
falling in the 70s and 80s (Holtzman et al., 
1961). 

As Holtzman described them, the blots that 
were finally selected 


cover a wide range of stimulus variation, giving the 
individual ample opportunity to reveal certain asa 
pects of his mental processes and personality by 
projecting his thoughts onto otherwise meaningless 
inkblots. Twelve of the inkblots in Form A are black 
or gray, two are monochromatic, eleven are black 
with a bright color also present, and the remaining 
twenty are multicolored. Most of the blots have rich 
shading variations which help to elicit texture re- 
sponses. A similar distribution of color, shading, and 
form pes is present in Form B [Holtzman, 1968, 
p. 139]. 
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Except for the marked asymmetry of many 
HIT blots, it is interesting to note the es- 
sential similarity between the stimulus prop- 
erties of the HIT and those of the traditional 
Rorschach Blots. (Is this new bottles for 
old wine?) 


Variables Derived from HIT Responses 


The following 22 variables comprise the 
scoring system developed for the HIT (Holtz- 
man, 1968): 

Reaction Time (RT)—the time, in seconds, 
from presentation of the inkblot to the begin- 
ning of the primary response. 

Rejection (R)—score 1 when the subject 
returns the inkblot to the examiner without 
giving a scorable response, 

Location (L)—tendency to break down the 
inkblot into smaller fragments; score O for 
use of the whole blot, 1 for use of a large 
area of the blot, 2 for use of smaller areas 
of the blot. 

Space (S)—score 1 for responses involving 
a figure-ground reversal where white space 
constitutes the figure and the inkblot is the 
ground. 

Form Definiteness (FD)—a 5-point scale 
ranging from a score of 0 for a concept having 
completely indefinite form (“squashed bug") 
to a score of 4 for highly specific form (“man 
on horse"), 

Form Appropriateness (FA)—goodness of 
fit of the form of the concept to the form of 


the inkblot; score 0 for poor, 1 for fair, and 
2 for good form. 


primary 
present (as in the Rorsch h 
Response Fey. T 

Shading (Sh)—importance of 
texture as a determinant: 
used, 1 when used only in 


shading or 
Score O when not 


Movement (M)—a 5-point scale for mea- 
ing the degree of movement, tension, or 
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dynamic energy projected into the percept by 
the subject, regardless of content; score O for 
none, 1 for static potential (sitting, looking, 
resting), 2 for casual movement (walking, 
talking), 3 for dynamic movement (dancing, 
weeping), and 4 for violent movement (whirl- 
ing, exploding). 

Pathognomic Verbalization (V)—a 5-point 
scale ranging from 0 (no pathology present) 
to 4 (very bizarre verbalizations) for mea- 
suring the degree of disordered thinking repre- 
sented by fabulations, fabulized combinations, 
queer responses, incoherence, autistic logic, 
contaminations, self-references, deteriorated 
responses, and absurd responses, 

Integration (1)—score 1 when two or more 
adequately perceived blot elements are or- 
ganized into a larger whole. 

Human (H)—score O for no human con- 
tent present, 1 for parts of human beings, 
featureless wholes, or cartoon characters, 2 for 
differentiated humans or the human face if 
elaborated. 

Animal (A)—score 0 for no animal content, 
1 for animal parts, and 2 for whole animals, 

Anatomy (At)—score 0 for no penetration 
of the body wall, 1 for X rays, medical draw- 
ings, or bone Structures, and 2 for viscera or 
soft internal organs. 

Sex (Sx)— score 0 for no direct sex refer- 
ences, 1 for socially accepted sexual activity 
and expressions (“buttocks,” “kissing”), and 
2 for blatant sex references “Denis”). 

Abstract (Ab)—score 0 if no 
cept is present, 1 if abstract elements are 
secondary, and 2 if the response is wholly 


abstract, for instance, “Reminds me of happi- 
ness," 


abstract con- 


Anxiety (Ax)—a 3-point scale for rating 
the degree of anxiety apparent in the content 
of the response as reflected in feelings or atti- 
tudes (“frightened animal"), expressive be- 
havior (“girl escaping”), symbolic responses 
(“dead person”), or cultural stereotypes of 
fear (“witch”): score 1 when debatable or 
indirect, and score 2 when clearly evident. 

Hostility (Hs)—aq 4-point scale for rating 
degree of hostility apparent in the content of 
the response, with increasing score as hostility 
Moves from vague or symbolic expressions to 


More direct, violent ones in which human 
beings are involved, 


=e sa" 
Le 


HOLTZMAN INKBLOT TECHNIQUE 1 


Barrier (Br)—score 1 for reference to any 
protective covering, membrane, shell, or skin 
that might be symbolically related to the 
perception of body-image boundaries. 

Penetration (Pn)—score 1 for concepts 
symbolic of body penetration. 

Balance (B)—score 1 where the subject 
expresses concern for the symmetry-asym- 
metry dimension of the inkblot. 

Popular (P)—score 1 if a popular response 
is given, popular responses being defined sta- 
tistically for specific areas of the inkblots in 
earlier normative studies of the HIT. 

The decision to include a variable in the 
HIT scoring scheme was based on several 
considerations (Holtzman, 1966; Holtzman 
et al, 1961): (a) Could most of the tradi- 
tional Rorschach scores be derived from the 
new system? (5) Is it theoretically possible 
to have any score from 0 to 45? (c) Can high 
Scoring agreement be reached among trained 
individuals? (d) Is the variable relevant to 
the study of personality? and (c) Are the 
variables logically (but not necessarily em- 
pirically) independent of one another? It 
Seems that all of these conditions have been 
met, Rorschach Whole (W), Determinants 
(D), or Common Detail (d) would be Loca- 
tion 0, 1, or 2, respectively. Rorschach Human 
Movement (M) is Movement scored 2 or 
higher. Rorschach Entire Human (77) is Hu- 
man 1 or 2. Rorschach FC, CF, or C would 
correspond to the number of chromatic cards 
coded 1, 2, or 3, respectively, for Color. Item 
b above is evident from the structure of the 
Scoring scheme, and Item c is a matter of 
data (discussed below). Item d needs to be 
decided on the basis of validity research, and 
Item e seems to be a matter of judgment. 


Reliability 

To be sure, the characteristic of the HIT 
that stands out most is its improvement, 
reliability-wise, over other inkblot techniques. 
Though reliability estimates are shown to 
vary markedly from sample to sample (Kobler 
& Doiron, 1968), variations can usually be 
attributed to skewness, a restriction in the 
range of variation in a sample, or chance. 
And, in general, the magnitude of the co- 
efficients is quite acceptable. Interscorer cor- 
relations range from .89 to .995 when highly 
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trained scorers are used to evaluate individual 
protocols. Even examiners whose training is 
limited to little more than a reading of the 
examples in the scoring guide produce inter- 
scorer correlations ranging from .73 to .89. 
Intrascorer consistency ranges from .89 to .97. 
Test-retest stability has been shown to range 
from .39 (for P) to .82 (for L) over a period 
of one week, Test-retest correlations obtained 
when the HIT was administered at an interval 
of one year ranged from .24 (P) to .75 (L) 
(Holtzman et al, 1961). Holtzman believes 
that the average stability coefficient observed 
over several studies is high enough so as to 
provide stable measures of personality phe- 
nomena, but low enough to indicate that the 
HIT is sensitive to normal variations in per- 
sonality over time (Holtzman et al., 1961). 


Technical Refinements and Extensions 


Gorham (1967) reported on the reliability 
and validity of computer-scored (Moseley, 
Gorham, & Hill, 1963), group-administered 
HIT protocols on a group of 145 college stu- 
dents. The 17 HIT variables studied were L, 
R, FD, C, Sh, M, I, H,.4, dt, Su, AD, An, 
Hs, Br, Pn, and P. The criteria for computer- 
score validation were an expert scorer's values 
obtained when protocols were scored in the 
usual manner. The correlation between the 
average of three hand scorers and the com- 
puter scores either equaled or approached the 
interscorer reliability of the hand scorers. 
Overall, the correlations between hand scores 
and computer scores ranged from a low of 
.54 (AD) to a high of 1.00 (R). An additional 
validity index reported is the fact that a 
factor analysis of computer scores resulted in 
eight factors that were nearly identical to 
factors obtained when hand-scored protocols 
were analyzed in this way. 

Norms exist for the computer-scored HIT 
variables L, R, FD, C, Sh, M, 1, H, A, At, 
Sx, Ab, Hs, Br, Pn, P (Gorham, Moseley, & 
Holtzman, 1968) on over 5,000 subjects in- 
cluding high school and college students and 
United States Navy enlistees; clinical sub- 
jects include Veterans Administration. and 
state hospital depressives, schizophrenics, psy- 
choneurotics, alcoholics, and chronic brain- 
Syndrome patients; cultural samples include 
university students from Argentina, Australia, 
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Colombia, Denmark, Germany, Mexico, Hong 
Kong, Hungary, India, Japan, Lebanon, Ni- 
geria, Panama, Turkey, Venezuela, and Yugo- 
slavia. The usefulness of these norms as ap- 
plied to individually administered and scored 
inkblots should be evident from the finding 
that for the various methods, correlations have 
either equaled or approached the magnitude 
of interscorer reliability (Gorham, 1967; 
Holtzman, Moseley, Reinehr, & Abbott, 
1963). 
In addition, Herron (1963) has shown that 
a short form (first 30 items) of the group- 
administered HIT produces means and stan- 
dard deviations for most variables that are 
highly similar to those observed on the 45-item 
test. Using a criterion setting internal stability 
at .70, then, the variables R, LED; BA, C. 
V, and H can be used with confidence to 
detect group differences using the short form. 
The availability of these technical exten- 
sions should increase the attractiveness of the 
HIT "package," especially for research in- 
volving group comparisons, It can probably 
be assumed that the time required to ad- 
minister and score the Rorschach has deterred 
many researchers from using this instrument. 
The high degree of concurrent validity shown 
in the computer-scored, group-administered, 
and short form of the HIT should facilitate 
the inclusion of inkblot variables in studies 
where they might not appear under other 
circumstances, Psychological researchers are, 
it has been said, only human, and this alone 
may predispose them to seek out relatively 
quick and easy measures for research. (The 
Taylor Manifest Anxiety Scale is probably 
legendary in this respect.) 
_ A technical refinement of a different sort 
is implied in a study by Megargee (1966). 
It will be recalled that one of the deficiencies 
of the Rorschach that the HIT sought to over- 
come was the contaminating influence that 
variable Tesponse productivity has on other 
PET pne pes solution provided by the 
imiting the subject to one 
response per card so as to rule out the influ- 
ence of this factor at least on a single card. 
It is questionable, however, that this proce- 
dure has solved this problem. Megargee 
(19662) pointed out that though response 
frequency is controlled in the HIT (a sup- 
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posed psychometric advantage over the 
Rorschach), there is still the possibility that 
productivity (of which response frequency is 
an index) may not be controlled as response 
length is still free to vary. Response length 
(RL) refers to the mean number of words 
used in each scorable response. Megargee 
(1966a) correlated RL with all of the HIT 
variables in two samples (84 college students 
and 75 male juvenile delinquents) and found 
several highly significant positive relationships 
in both samples between RL and M, Ab, Ax, 
Hs, and Br. The pervasive influence of RL 
was shown when this variable was factor 
analyzed along with the other HIT variables 
and found to load (.71) on a most important 
factor of the HIT (Factor I, defined by M, 
1, B, H, and P and accounting for more vari- 
ance than any other factor). 

A second study (Megargee, 1966a) sought 
to determine more precisely the relationship 
between RL and M, an important index of 
personality functioning. A group of high-M 
subjects and a group of low-M subjects (on 
the basis of HIT, Form B) were further sub- 
divided into groups encouraged to give long 
responses (30 words) and groups encouraged 
to give short responses (10 words), on Form 
A. These groups were then assessed for amount 
of M. Manipulating RL in this way had a 
highly significant impact (p< .001) on M 
production. The low-M-short group produced 
a mean M of 12.33 on the second administra- 
tion, high-M-long produced 71.20, high-M- 
short produced 25.53, low-M-long produced 
43.67. These effects are clear-cut and strong. 
Megargee (1966) pointed out that RL may 
be an important variable entering into many 
relationships between inkblot perception and 
personality, including intelligence, experi- 
mental manipulations, and examiner effects. 
Needless to say, future investigations using 
the HIT must be regarded as inadequate un- 
less they come to terms with the “problem” 
of RL. In another vein, it is conceivable, in 
view of Megargee's (1966a) experimental 
findings, that RZ wil turn out to be an 
important variable in its own right. 


Direct Comparisons with the Rorschach 


The initial comparison between the HIT 
and the Rorschach is reported in Holtzman 
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et al. (1961). In this study, HIT and 
Rorschach protocols (scored by Beck's sys- 
tem) of 72 high school students were com- 
pared on eight variables judged to be com- 
parable in the two systems. When Rorschach 
scores were adjusted for response frequency, 
the correlations between these two instruments 
were found to range from .30 (FA) to .79 
(4), and all correlations were significant be- 
yond the .01 level. Extensive multivariate 
analyses of these data were carried out by 
Bock, Haggard, Holtzman, Beck, and Beck 
(1963). A canonical regression analysis 
(judged by these investigators as most rele- 
vant to the question of Rorschach-HIT equiv- 
alence) revealed that 6096 of the stable 
canonical variation in the Rorschach scores 
was predictable by 14 HIT scores (8 HIT 
Scores had to be eliminated because of a 
high number of zero frequencies). Thus, 
while it is apparent that the Rorschach and 
HIT have much in common, it is also evident 
that these two techniques are far from mea- 
suring the same variables as well, and it is 
questionable whether or not the same dimen- 
sions on both tests ought to be given the 
same interpretations. 

A somewhat unusual comparative study by 
Otten and Van de Castle (1963) compared 
Form A of the HIT with the Rorschach on a 
semantic differential, using bipolar adjectives 
such as clean-dirty, active-passive. The 10 
Rorschach cards were mixed in with the HIT 
cards, and the subjects responded to each 
card on the differential. The HIT cards were 
found to cover all of the semantic space cov- 
ered by the Rorschach, and some patterns 
emerged on the HIT that were not present on 
the Rorschach. It is interesting to speculate 
that the latter find is related to the effort to 
put more “pulling power" in the HIT cards 
to compensate for the one-response require- 
ment. : 

Whitaker (1965) correlated Pathognomic 
Verbalization (V) scores on these two tests 
administered to 45 psychiatric inpatients. 
The HIT method of scoring V was applied to 
the Rorschach protocols as well. Independent 
scorers were enlisted to analyze a subset (JV = 
19) of these protocols. The correlation ob- 
tained between Scorer A’s Rorschach with 
Scorer B's HIT scores was .76 (p< .01). 
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Scorer A’s Rorschach and HIT scores for the 
entire sample of 45 patients correlated .94. 
The latter coefficient is undoubtedly somewhat 
inflated due to the operation of criterion con- 
tamination as only one scorer was involved. 
On the other hand, the correlation of .76 may 
be somewhat attenuated because of examiner 
unreliability due to criterion disagreement. In 
either case, Rorschach and HIT V appear to 
have much in common. 

In view of the effort to maintain some of 
the Rorschach tradition in the construction of 
the HIT, there are surprisingly few studies 
effecting direct comparisons between the two 
techniques. The studies that have been car- 
ried out are all positive in one sense or an- 
other, but it is clear that, even aside from the 
psychometric differences, the instruments are 
not ¢hat similar. Actually, the kind of com- 
parison that is needed—one in which diag- 
noses rather than scores are compared—has 
not as yet been reported in the literature. If 
the correlations between the test scores were 
higher than they are, this would obviously be 
unnecessary. But since the scores do not 
correlate that perfectly, a diagnostic compara- 
tive study appears essential. Besides the direct 
comparison of diagnoses, other interesting 
comparisons could be made as well. For ex- 
ample, what of the dimensions that these two 
techniques do not have in common? Do they 
have additional diagnostic significance? Or, 
if the diagnostic formulations turn out to be 
much the same for the two techniques, is 
there more stability in the HIT diagnosis on 
repeated tests? In general, a great deal of 
comparative work that could have been done 
with these instruments has not been done. 


Examiner and Set Influences 


Experimental evidence from several sources 
agrees that “examiner” variables have a 
marked effect on certain HIT variables. 

Hamilton and Robertson (1966) randomly 
assigned 90 college students to “warm,” 
“cold,” and “neutral” conditions of examina- 
tion, the same examiner serving under all 
three conditions. Significant differences were 
found in scores on FD, FA, M, I, H, and 
Word Productivity. The “warm” condition 
resulted in the most productive protocols, and 
the “cold” condition resulted in the least pro- 
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ductive protocols with respect to these vari- 
ables. ` 

Simkins (1960) reported a study designed 
to test the influence of the presence and de- 
gree of examiner reinforcement on HIT re- 
sponses. Subjects were matched as closely as 
possible with respect to similarity of content, 
location, and determinants given on an initial 
administration of the HIT. Three matched 
groups were obtained in this way and given 
one of the following treatments during a 
second testing session: strong reinforcement 
for a given response category (“very good”), 
weak reinforcement (“um-hmm”), or no re- 
inforcement, Scores termed by Simkins as D 
scores were computed for each subject based 
on the change in the number of responses in 
a reinforced category from the first to the 
last session. Another difference score, D', was 
based on the difference between scores for 
matched pairs under strong- and weak-rein- 
forcement conditions. Weak reinforcement was 
found to be more effective in both the location 
and content categories, while strong reinforce- 
ment was most effective with determinant 
scores, There was no significant difference, 
however, between weak and strong reinforce- 
ment. In comparison to matched controls, the 
content and determinant dimensions showed 
the largest reinforcement effects, and location 
Was most resistant to reinforcement-induced 
change. 

Marwit and Marcia (1967) studied the 
effect of “experimeter bias" on the number 
of responses (not usually free to vary) given 
by college students to five selected HIT cards, 
The examiners were 36 volunteer students 
from an undergraduate course in experimental 
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quence, an increasing function “learning” 
curve appeared with significant linear (p< 
.01) and quadratic (p < .05) trends in the 
high-expectancy groups. This finding suggests 
that a learning process (in the experimenter 
or subject, or both!) may be mediating ex- 
aminer effects. 

A somewhat different kind of “examiner” 
influence was studied by Herron (1964). This 
investigator administered the HIT to college 
students (a) under the standard instructional 
set and (b) as a test of intellectual ability. 
Under the second condition, significant de- 
creases were found in Pn, Hs, A, and V and 
a slight (but not highly significant) increase 
in FA. These results were interpreted as re- 
flecting a slight “tightening-up” of cognitive 
process under the intelligence test set, 

To be sure, the findings of studies dealing 
with examiner influences are interesting in 
their own right. Still, they may be criticized 
for their failure to address themselves to 
broader issues in the assessment of personality 
through inkblot perception. Knowledge that 
examiner and set variables influence specific 
scores is certainly useful information. These 
findings remind us of the necessity of strict 
standardization of the test situ 
other influences are our main concern. Tt must 
be remembered, however, that these studies 
reflect experimentally induced as opposed to 
natural examiner differences, and generaliza- 
tions from the former to the latter cannot be 
made without qualifications. Further, the as- 
sertion that diagnoses or personality descrip- 
tions are affected by examiner variables does 
not follow from the fact that individual scores 
are so affected. To demonstrate that diag- 
noses, for instance, are a function of examiner 
influences requires studies in which diagnoses 
rather than scores are the dependent variable. 
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STUDIES or EXTERNAL VALIDITY 
Developmental Changes 


Organismic developmental theory (W 
1948) views the development of ]i 
tems as proceeding in the direction of increas- 
ing differentiation of part functions and 
Processes, followed by increasing hierarchic 
integration achieved by the subordination of 
these part functions. As applied to inkblot 
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perception, significant. developmental trends 
may be predicted when responses are defined 
in terms of these principles. 

Thorpe and Swartz (1965) reported a de- 
velopmental study of HIT responses that 
used the developmental principles outlined 
above. Five criterion age groups from 5 to 22 
years of age were isolated, and care was taken 
to insure that each group contained equal 
numbers of males and females (V = 586). All 
10 HIT variables studied showed significant 
age-group differences (most at p < .001), but 
there were no significant sex differences nor 
were there any significant Sex X Age inter- 
actions, Six variables—FA, FD, J, M, H, and 
Sh—showed steadily increasing means with 
increasing age. The changes observed in F4, 
FD, and / are in keeping with Werner's notion 
that cognitive development proceeds in a 
direction away from loosely organized and 
amorphous percepts. The increases in Z/ and 
M probably reflect an increase in the inte- 
grative capacities of the organism in the area 
of perceptual cognitive development. The 
systematic increase in S% responses probably 
reflects an increasing sensitivity to very subtle 
aspects of the stimuli and an ability to inte- 
grate these with form quality to produce 
richer percepts. Increases in L followed by a 
decline in L in the oldest group follows the 
trend of increasing differentiation followed by 
integrative efforts (W responses) resulting in 
the actual lowering of L in the most advanced 
group—theoretically a sign of hierarchic inte- 
grative efforts. Animal responses went up and 
down in seesaw fashion in the groups studied. 
V first declined and then rose again, as did C. 
"These latter trends are more difficult to inter- 
pret developmentally, and attempts at theo- 
rizing should be contingent upon their repli- 
cation in other studies. 

In a follow-up study (Thorpe & Swartz, 
1966) that involved a partial replication of 
a previous study (Thorpe & Swartz, 1965), 
the HIT was administered individually to 
normal subjects who were 6.7, 9.7, and 12.7 
years of age. On the basis of the develop- 
mental trends found earlier, eight HIT vari- 
ables were selected for study: FA, FD, I, M, 
H, C, Sh, and V. Significant age-group differ- 
ences were found again by analysis of vari- 
ance for all variables except S/. The only sex 
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difference to emerge was on H, where females 
had slightly higher scores than males. In line 
with the earlier study, F4, FD, 7, M, and H 
showed increasing mean scores across the age 
groups emploved. At variance with the earlier 
data, V was found to decrease with age. As in 
the earlier study, L showed a decline from 6.7 
to 9.7 years of age and then a slight increase 
at age 12.7. 

Swartz, Lara Tapia, and Thorpe (1967) 
provided intercultural validation for the de- 
velopmental trends observed in the studies 
described above. In this study, 300 normal 
Mexican school children living in Mexico City 
were divided into criterion-age groupings of 
6.7, 9.7, and 12.7 years. Of the 11 HIT varia- 
bles studied, RT, FD, FA, C, M, J, and H 
showed significant (p < .001) age-group dif- 
ferences. No Sex X Age interactions were 
found, but females at each age level were 
found to have a higher mean L score (p < 
.01) and a lower M score (p < .01) than 
males. All of the variables showing significant 
age effects with the exception of C showed 
steadily increasing means as a function of age. 
Five variables in this study—FA, FD, M, J, 
and H—showed developmental trends entirely 
consistent with those observed in other sam- 
ples drawn from the United States. One trend, 
not observed in other samples, was the linear 
increase in RT in the Mexican sample. This 
may turn out to be an important cultural 
deviation that future research ought to clar- 
ify. In the main, this study provided strong 
support for the developmental significance of 
trends found in common with earlier studies. 

Sanders, Holtzman, and Swartz (1968) car- 
ried out an excellent longitudinal study of 
developmental trends in the HIT C variable. 
On the basis of previous results and theory, C 
responses (“3” in the HIT system) should 
predominate in early childhood, CF (HIT 
“2”) in later childhood, and FC (HIT “1”) 
in adolescence and adulthood. The 323 sub- 
jects in the study were divided into three age 
groups: 6.7, 9.7, and 12.7 years. These groups 
were observed over a 10-year period (from 
5.7 to 15.7 years of age). As predicted, sig- 
nificant declines in C were observed as a func- 
tion of age. Significant monotonic increments 
in CF and FC with age were not o 


bserved, 
however. 
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Taken together, these studies provide strong 
support for the view that certain HIT scores 
provide reliable indexes „of : developmental 
changes in cognitive organization. The nature 
of these changes and the particular variables 
reflecting this change strongly support Wer- 
ner’s (1948) notion that cognitive develop- 
ment proceeds along lines of increasing differ- 
entiation and integration, but certain details 
of the trends may require additional assump- 
tions. The decline and resurgence of V found 
in one study, for example, may reflect the 
capacity to integrate somewhat pathological 
elements in basically normal personalities, or 
it may reflect stresses of an emotional nature 
brought on by puberty. Ignoring slight am- 
biguities in the interpretation of minor as- 
pects of the data, the Consistency of findings 
with respect to theoretically relevant and de- 
velopmentally meaningful conceptions is note- 
worthy. A logical next step for research in 
this area to take would involve testing sub- 
jects in middle and old age to determine 
whether the HIT js Sensitive to the reverse 
trends of decreasing differentiation and hier- 
archic integration predicted by organismic 
developmental theory. 


Cross-Cultural Studies 


Knudsen, Gorham, and Moseley (1966) 
tested the universality of the HIT variable P 
by group administration of the HIT to sub- 
many, Hong Kong, Den- 
ed States. The criterion 
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agreement on these dimensions among indi- 
viduals with diverse backgrounds (psycholo- 
gists and American, Mexican, Chinese, and 
German students). Structural ambiguity was 
assessed by ratings on a 3-point scale (high to 
low), and interpretive ambiguity was mea- 
sured by the total number of different words 
given in response to each HIT card by each 
of four samples of 100 subjects from Mexico, 
Germany, China, and the United States. 
Group administration was used (Swartz & 
Holtzman, 1963), and scoring of the proto- 
cols was carried out by means of the com- 
puter program developed by Gorham (1967). 
The structural ambiguity ratings of the four 
student samples correlated significantly on 
all cards (.70-.80), and the correlation þe- 
tween the psychologists ratings and the 
average pooled student ratings was .90. The 
extent of agreement on interpretive ambiguity 
is reflected in the correlations among the 
various national samples. These relationships 
were positive and only moderately high (.44— 
:55) but all significant. The average correla- 
tion between structural and interpretive am- 
biguity was — 35 (p < .01). As the authors 
pointed out, these findings suggest that struc- 
tural ambiguity is a Concept that may be 
culture free, as indicated by the between- and 
within-cultural consistencies observed. The 
findings with respect to interpretive ambiguity 
are not as strong, but they tend to point in 
the direction of a fair amount of intercultural 
agreement, 

These cross-cultural Studies strongly sup- 
port the intercultural validity of certain proc- 
esses involved in inkblot perception. Other 
studies are finding interesting intercultural 
differences in HIT Performance as well. 


that | 6—12-year-old 


Mexico City School, 
She found consistent and stable differences 

The differences that 
s are interesting. and 
ated. In view of the 
tural commonality ob- 


Semis responses, any Consistently ob- 
ved differences are bound to deserve extra 
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attention. (for a comprehensive account of 
longitudinal and cross-cultural research with 
the HIT currently in progress, see Holtzman, 
Diaz-Guerrero, Swartz, & Lara Tapia, 1968). 


Cognitive Processes 


Data obtained from the HIT standardiza- 
tion sample of seventh-grade children showed 
significant correlations from .20 to .31 be- 
tween measures of general mental ability and 
R (inverse), L, M, FA, Sh, I, Ax, Hs, and Br. 
"Thorpe and Swartz (1963) replicated the pos- 
itive relationship between mental ability and 
R in low (mean of 12 rejections) and high 
(mean of 6 rejections) groupings of seventh- 
grade children. Similarly, Holtzman, Gorham, 
and Moran (1964) found significant positive 
correlations between vocabulary and J, M, 
and FA in chronic, paranoid schizophrenics 
(details of this study are presented below), 
and Holtzman (1968) reported an average 
correlation of .27 between vocabulary levels 
of school children and M, FD, I, H, Hs, Br, 
and P. Whether or not these relationships hold 
up in cognitively more stable normal or neu- 
rotic adults remains to be determined—and 
ought to be. 

Young (1959) intercorrelated 10 cognitive 
personality measures, some of which were 
measures of the field dependence-indepen- 
dence construct. Two measures—a "coping" 
Score and an “introspective” score—were de- 
rived from the HIT. For males, low intro- 
spectiveness was associated with low-analytic 
ability (r = —.29) but not with field de- 
pendence, and low-coping ability was associ- 
ated with field dependence (.32 and .44—two 
tests), For females, low introspectiveness was 
associated with field dependence (.30), and 
low-coping ability was associated with field 
dependence (.43) and with low-analytic abil- 
ity (—.36). The manner in which these re- 
sults were reported made it impossible to de- 
termine exactly what HIT variables were in- 
volved in these relationships. 

In a study comparing three groups of col- 
lege students differing only in verbal quanti- 
tative discrepancies in ability, Sanders, Mef- 
ferd, and Brown (1960) found no significant 
differences on HIT variables, with the excep- 
tion of RT. Subjects with consonant verbal 
quantitative abilities were found to have the 


longest RT, followed in order by low-verbal- 
high-quantitative subjects and high-verbal— 
low-quantitative subjects. Subjective impres- 
sions of the examiners, however, correctly 
placed 60% of the subjects in the three groups 
based only on the HIT responses, but this 
mode of data analysis was not elaborated by 
the authors. 

In a study of convergent and divergent 
thinking in talented adolescents, Clark, Veld- 
man, and Thorpe (1965) included 16 HIT 
variables in an analysis of several measures 
related to these cognitive styles. There were 
no significant main HIT effects associated 
with convergent thinking, but L, M, Ax, Hs 
(p < .01), C, and Pn (p< .05) successfully 
discriminated between high and low groups in 
divergent thinking. High-divergent thinkers 
used larger blot areas (low L) and produced 
higher scores on all of the other five variables 
mentioned. These results were interpreted as 
indicating that divergent thinkers are capable 
of giving freer rein to their imaginations when 
given the opportunity. In addition, however, 
these same subjects were more responsive to 
the stimulus characteristics of the inkblots 
(low L, high C), so that their imaginativeness 
was not achieved at the sacrifice of reality 
contact. This pattern is very reminiscent of 
typical findings in Rorschach studies of cre- 
ativity. 

Richter and Winter (1966) found signifi- 
cance differences between high- and low-cre- 
ative potential (Myers-Briggs Type Indi- 
cator) female college students (matched for 
age and verbal ability) on FD (p < 01), C 
(p < 001), M (p< .0005), H (p< .005), 
I (p«.05,V (p< 025), Ax (p< .0005), 
Hs (p < .025), and Ab (p < .05). The high- 
creative potential group achieved higher 
scores than the low-creative group on all of 
these variables. Interestingly, L (reflecting 
whole or detail responses) did not differenti- 
ate these groups. One reason offered by the 
authors to explain the latter finding is that 
the HIT does not differentiate between com- 
plex, accurately perceived wholes reflecting 
creative ability and whole responses that do 
not require much in the way of the superior 
integrative capacities that highly creative per- 
sons are said to possess. 

Using Megargee's (1966a) finding that re. 
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sponse length (RL) is a significant mediating 
(or confounding) variable, Gray (1969) tried 
to determine whether HIT measures of pri- 
mary process (an important component of 
creative functioning according to many psy- 
choanalytic-oriented writers) would still cor- 
relate significantly with measures of creativ- 
ity when RL was partialed out. Holt’s (1963) 
system was used to operationally define pri- 
mary process, and six tests of divergent think- 
ing comprised a creativity battery. The cor- 
relation between these measures was .23 (p 
< .05). When productivity scores (RL on 
the HIT and number of responses on the cre- 
ativity tests) were partialed out, the correla- 
tion between primary process and creativity 
fell to .06. Thus, the correlations that have 
been observed in th 


bles were very likely due to a third variable, 
productivity, And 
to the HIT studies, it is probably doubly true 
in the many studi i 
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number of respons 
to produce many 
tween inkblot inde 
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also (more creatively) 


lenge to future in 
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responses referring to objects with well-defined 
boundaries ("turtle with a hard shell" for 
example) are counted and summarized in a Br 
score. Arthritics have typically been found to 
score relatively high on this scale. By con- 
trast, ulcer patients, who tend to project their 
body exterior as weak and vulnerable, score 
low on the Br scale but relatively high on a 
Pn scale (scored for responses like 
wound"), 

Though earlier studies of body image uti- 
lized the Br and Pn scales in conjunction with 
the Rorschach test, more recent investigations 
have utilized the HIT. Cleveland and Fisher's 
data (1960), using the HIT, upheld earlier 
investigations with the Rorschach by finding 
significantly higher By scores in arthritics as 
opposed to ulcer patients (P € 001), while 
the reverse trend was found on Pn Scores (p 
< .05—.10). A hostility score derived from the 
HIT responses failed to differentiate these 
groups, but in comparison with norms reported 
on normal subjects, both of thes 
high in projected hostility. 
sistent with theoretical forn 
ing arthritics, this grou 
significantly higher te 
vigorous competitive s 
unusually strong inter 
ties (p< 001) (all 


were male Veterans 
tients). 


“bleeding 


e groups were 
Incidentally, con- 
nulations concern- 
P was found to have a 
ndency to engage in 
Ports (p < 001) and 
ests in cooking activi- 
subjects, by the way, 
Administration inpa- 


and these were related to 


" sub- 
g antly more 
active, indepen- 


isher (19 
Y inducing ; 
in normal males, HIT AP ke nag 


changes 
a test-retest basis 


res assessed on 
the criteria in 


HOLTZMAN INKBLOT TECHNIQUE 


this study. In between the first and second 
HIT administrations, subjects were asked 
either (a) to direct their attention to their 
skin and muscles, (b) to focus on the interiors 
of their bodies, or (c) not to focus on the 
body. Exterior-focusing males did not show a 
significant increase in Br (unlike females), 
but interior-focusing males did show a signifi- 
cant decrease in Br (unlike females). When 
the data for females (Fisher & Renik, 1966) 
and males (Renik & Fisher, 1968) were com- 
bined, the exterior-focus condition was found 
to significantly increase Br when compared to 
the interior-focus condition, but not as com- 
pared to the control condition. The interior- 
focus condition did produce a significant Br 
decrement when compared to the control con- 
ditions. In comparing the results of these two 
studies, it is interesting to note that females 
became "exterior" more easily, while males 
became "interior" more easily. The explana- 
tion offered for this sex difference is that 
females are more practiced in the alteration 
of the body surface (use of makeup, adorn- 
ments, etc.) than are males. When men are 
found who have a tendency to ornament their 
body surface by tattoo (Mosher, Oliver, & 
Dolgan, 1967), they are found to have signifi- 
cantly higher Br scores on the HIT than men 
who are not so inclined. In both cases (fe- 
males and tattooed males), increased Br seems 
to reflect a tendency toward boundary- 
Strengthening preoccupations. (No significant 
sexual deviations were found in the men.) 

A single dissenting set of findings (Hart- 
ley, 1967) found no significant relationships 
between HIT Br scores and site of symptoms 
(internal versus external) in a sample of col- 
lege students. Caution must be exercised in 
interpreting these results, however, as evi- 
dently none of the reported symptoms were 
serious enough to require hospitalization, and 
45 of the 83 subjects studied could not en- 
dorse symptoms in “often” or “serious enough 
to require medical treatment" categories. 

Taken together, these studies reinforce the 
Significance of the body-boundary construct 
in personality research, and they tend to sup- 
port the construct validity of the HIT Br and 
Pn scores. Other studies relevant to these 
HIT dimensions are presented elsewhere in 
this review. 
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Empathy 


Mueller and Abeles (1964) utilized the HIT 
io test the relationship between the produc- 
tion of human movement responses and the 
capacity for empathy in advanced students 
in clinical and counseling psychology. The 
HIT responses were scored for M, H, and FA. 
Several components of empathy were as- 
sessed. The findings showed that perceived or 
projected movement is significantly related to 
only one component of empathy, the accuracy 
with which others perceive the subject’s be- 
havior. In view of this result, the “high M” 
person was felt to be a person who makes 
more information about himself available for 
appraisal by others—an interesting but rather 
unusual notion of “empathy.” It would seem 
that this study could have benefited consid- 
erably by the inclusion of nonclinical control 
groups (e.g., experimental students, graduate 
physicists), as one possible explanation for 
the generally weak findings may be the restric- 
tion in the range of variation of the empathy 
variable by using clinical students on the one 
hand, and the restriction in range of H, M, 
and FA that must have occurred because of 
the intellectual level of this sample, 

Fernold and Linden (1966) tested the hy- 
pothesis that the HIT variable H is posi- 
tively related to social isolation and functional 
pathology (schizophrenia). Contrary to the 
advice of Holtzman et al. (1961), human, 
human detail, and humanlike responses re- 
ceived equal credit in assigning an H score. 
Using a group sociometric technique for as- 
sessing social isolation and empathy, no rela- 
tionship was found between these variables 
and H. When social interest scores on the 
Strong Vocational Interest Test were related 
to H, a significant relationship was found and 
in the predicted direction, To test the rela- 
tionship between H and psychopathology, a 
group of twenty-one normal 23—55-vear-old 
firemen were compared to a group of twenty 
21—-57-year-old chronically hospitalized male 
schizophrenics. The number of responses 
given by these two groups differed signif. 
cantly (p < .01) and in the predicted direc- 
tion. 


M and H are indeed important variables in 
Rorschach lore, and it would be important to 


184 


have information on the validity of these as 
assessed by the HIT. The two studies pre- 
sented here can only be taken as suggestive 
for reasons of design already discussed. An- 
other criticism to be made of these studies— 
indeed of many validation studies of both the 
HIT and the Rorschach—is that they are 
poorly conceived. As Ainsworth (1954) 
pointed out “there are no unqualified inter pre- 
tative hypotheses attached to these values as 
such [p. 417].” Yet researchers continue to 
design studies as if M, H, or any other de- 
terminant is supposed to reveal some psycho- 
logical predisposition. There is nothing 
“wrong” with these studies per se—as long 
as their results are not interpreted to reflect 
upon the validity of Rorschach usage as it is 
actually carried out in clinical practice. This 


criticism can be made of most of the studies 
reviewed herein, 


Aggression 


In comparisons of delinquent with nonde- 
linquent adolescents and extremely delinquent 
with less delinquent subjects, Megargee 
(1965b) set out to test the relationship be- 
tween HIT Br scores and aggressive behavior. 
The delinquent sample in this study was 
found to have a mean Br score significantly 
lower than the nondelinquents represented in 
Holtzman et als (1961) samples (p < .001). 
As a control on RL (known to be significantly 
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here must represent a doubly extreme case, 
as the normal adolescents they were compared 
with themselves had lower Br scores than 
most other nonclinical groups (Holtzman et 
al., 1961), 

In a study primarily directed toward ex- 
panding the normative data on the HIT, 
Megargee (19652) compared HIT protocols 
of 75 male delinquents with those of non- 
delinquent adolescents reported by Holtzman 
et al. (1961). The latter group comprised a 
sample of seventh and eleventh graders. Ac- 
cording to age criteria alone, the delinquent 
group would be expected to fall midway be- 
tween these groups. Instead, they were signifi- 
cantly lower than the eleventh graders on 
most variables and identical to or lower than 
the seventh graders, the only exceptions to 


this being R and At scores on which the de- 
linquents were significantly higher than the 


eleventh graders and slightly higher than the 
seventh graders (the .01 level of significance 
was maintained throughout). One way of 
characterizing these results is to describe the 
delinquent sample as grossly immature in its 
makeup, and this corresponds favorably to 
the salient features of delinquent behavior 
usually reported, 

In another normative study, Megargee 
(1966b) compared the HIT protocols of white 
and Negro male juvenile delinquents who 
were matched for mental age. Three HIT 
variables significantly differentiated whites 
from Negroes: V (whites higher, p < 03), At 
(Negroes higher, p< 04), and P (whites 
higher, p< 02). Three other variables 
showed noteworthy trends, but statistical sig- 
nificance was not achieved (whites were 
higher on RT, FA, and C). Interestingly, 

perform differently on 
the Thematic Apperception Test (TAT) or 
the Rosenzweig Picture-Frustration Study. 

In another Study, Megargee (1966c) di- 
chotomized a group of male juvenile delin- 
quents into a moderately assaultive and an 
extremely assaultive group and compared the 
subgroups on a variety of measures including 


the HIT. The genera] hypothesis of this study 
WaS that th 


e extremely assaultive group 
Would be Ower on measures of aggressiveness 
and higher on measures of control than the 
moderately assaultive group and groups of 
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nonassaultive delinquents selected for control 
purposes (incorrigibles, property offenders). 
The main findings of this study with respect 
to the HIT were as follows: (a) Comparison 
groups did not differ significantly on a Hs 
scale derived from the inkblot responses; (^) 
the prediction that the extremely assaultive 
group would be highest on a WM — C index 
(presumably a measure of overcontrol) was 
tested by simply subtracting each subject's C 
score from his M score. This hypothesis was 
supported as the extremely assaultive group 
was significantly higher on M — C than all 
three of the other comparison groups ($ < 
.061) and higher than the moderately as- 
saultive group in particular (p < .059). As an 
additional check on the overcontrol hypothe- 
sis, pure C responses were tabulated for all 
groups, and again the extremely assaultive 
group differed significantly from the moder- 
ately assaultive groups (p < .045), the former 
producing fewer pure C responses. (The dif- 
ference between extremely assaultive and the 
other groups combined was significant only at 
p < 111, but all differences were in the pre- 
dicted direction—showing relative overcontrol 
in the extremely assaultive group.) If for no 
Other reason than the fact that an attempt 
was made to conceptualize the relationship 
between the HIT variables and the predicted 
behavior in some psychologically meaningful 
way, this study is to be applauded. 
Megargee and Cook (1967) correlated a 
number of HIT-derived aggression indexes as 
well as TAT-derived aggression scales to ob- 
server and self-report ratings of aggression in 
76 juvenile delinquents. Complex patterns of 
relationships were found depending on the 
scale and the overt criterion used for measur- 
ing aggression. In general, the TAT scales 
were found to relate most closely to school 
conduct (preoffense behavior), while the HIT- 
derived scales related more closely to physical 
(not verbal) aggression assessed aíter arrest 
during interviews and by means of direct ob- 
servation. The Holtzman et al. (1961) hostil- 
ity scale correlated —.24 ($ < .05) with “to- 
tal physical aggression" and .25 (p< .05) 
with a global rating of aggression (low scores 
indicating greater aggressiveness) but .03 
with verbal aggression. One must concur with 
the authors conclusion: “For the clinician 
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who might wish to use these scales in the pre- 
diction of overt aggression in the individual 
case, the results are quite discouraging 
[Holtzman et al., 1961, p. 58]." It is neces- 
sary to add, however, that this study provided 
no information on nondelinquents’ aggression 
for purposes of comparison. It is possible, too, 
that the restriction of range in aggression due 
to such a select sample may have attenuated 
important relationships in this behavior do- 
main. A design that ought to be attempted in 
this area is one in which unselected normals 
who are both high and low on aggression (ink- 
blot assessed) are further subdivided into 
manifest aggressive and controlled groups for 
exploratory study on the factors that mediate 
between fantasy and overt aggression. 


Diagnosis 


Many of the studies included in this section 
are not, strictly speaking, studies of differen- 
tial diagnosis in the sense of diagnostic effi- 
ciency. Considering the latter type of diag- 
nostic study, there is a surprising dearth of 
information concerning the diagnostic validity 
of the HIT. Several studies, though not di- 
rectly concerned with diagnostic efficiency, 
provide data bearing on important aspects of 
psychiatric classification, so they are pre- 
sented in this section. 

Schizophrenia. Yn order to determine the 
major dimensions underlying schizophrenic 
thought processes, Holtzman et al. (1964) fac- 
tor analyzed the intercorrelations among HIT 
scores, taken from several perceptual cogni- 
tive tests, age, education, and length of ill- 
ness. The subjects were 99 chronic, paranoid 
schizophrenic men who had been hospitalized 
for from 20 months to 22 years. Of the eight 
factors needed to account for the common 
variance, only those that were found to have 
substantial loadings on the HIT variables are 
presented here. A factor called Integrated 
Ideation was defined primarily by M, I, Br, 
and R (negatively). Stimulus Sensitivity was 
defined positively by C, SZ, and B, and nega- 
tively by FD. Length of illness had a loading 
of —.40 on this factor, which points to à 
relationship between chronicity and sensitivity, 
Pathological Verbalization was defined by Ax 
Hs, and V. Though Conceptual Autism had 
its highest loading on an object-sorting task 
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revealing an “open-private” orientation (ab- 
stract sortings having private or idiosyncratic 
meanings), the HIT variables Ad, At, Sx, FA 
(negatively), Affect Arousal (AA) (since dis- 
carded from the HIT), and Pn also defined 
this factor. Sexual Concern was found to be a 
small but sharply delineated dimension in this 
sample, and it was defined primarily by Sx, 
AA, and S with some loading on L and Pn 
as well. The authors pointed out that this 
factor has not appeared in analyses of data 
from normal populations. In general the re- 
sults of this study were consistent with other 
studies of schizophrenic thinking that have 
used divergent measures. In the interest of 
questions of the concurrent factorial validity 
of the HIT, it should be emphasized that 
Common variance shared by both the HIT 
variables and the other measures was such as 
to reinforce the pathological significance of 
very high or very low scores on several of the 
HIT variables. 

As part of the standardization of the HIT 
(Holtzman et al., 1961), data were obtained 
on chronic paranoid schizophrenics, depressed 
neurotics and psychotics, and mentally re- 
tarded subjects. As compared to normal ref- 
erence groups, the chronic schizophrenics 
were found to obtain significantly higher 
scores on R, V, Pn, At, Sx, and significantly 
lower scores on L, FD, FA, Sh, M, I, Br, and 
D. 

Utilizing 16 HIT scores, Moseley (1963) 
performed a discriminant function analysis 
and developed a formula for the classification 
of schizophrenics, depressiv 
When applied to schizophr 
mals, classification was 88 
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depressives, correct classific 
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mals. 
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and BAB sequence of the alternate forms was 
used). All protocols were scored for Br and 
Pn. A significant rho of .60 was found be- 
tween morbidity ranking and Px for predrug 
versus fifth-week testing, and a rho of .61 
was obtained for predrug versus thirteenth- 
week testing. Br scores yielded no significant 
correlations in this analysis. By the second d 
terion (case disposition), the same trend is 
observed and reflected in significantly greater 
decrements in Px scores in discharged pa- 
tients as compared to patients retained in the 
hospital. In an unusual way, these findings 
tend to support other diagnostic studies 2 
volving schizophrenics. The changes in Pn 
were interpreted as reflecting “firming up 
and defining” of body image boundaries ds 
function of therapeutic improvement. € 
sumably, this represents an improvement 7 
that aspect of schizophrenic behavior ir 
has been described as “the inner experience 9 
personal dissolution at the periphery zi 
|Szasz, 1957, p. 127 |." If this is so, it is s 
zling that increases in Br were not found 
be associated with clinical improvement. - d 
As part of a study of empathy (describe 
elsewhere in this review), Fernold and Linden 
(1966) found that hospitalized male — 
phrenics gave significantly fewer H pnm 
than a comparison group of normal ma es 
(P < .01). 56) 
Process-reactive distinction. Becker (195 d 
suggested that process and reactive € 
phrenics may be differentiated on the alt 
cognitive developmental principles (W ep 
1948). Based on the salient features of nce 
process syndrome that relate to the Pity 
of a relatively undifferentiated persona - 
Structure (as opposed to a differentiated E 
hierarchically differentiated personality; pat 
Werner, 1948), Becker (1956) predicted p 
process schizophrenics would show more on 
gressive and immature cognitive processie t 
the Rorschach than would reactive piena 
phrenics. Using the conception that deve ie 
ment proceeds in the direction of poem 
differentiation and integration (W a 
1948), a genetic-level scoring system was ak 
veloped for Rorschach responses. At the or 
treme undifferentiated end of the seil, ale 
example, are Amorphous and Minus *" 
Tesponses, as well as Confabulations ^ 


p 
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Perseverations, At the highly differentiated 
end of the scale are Whole and Detail re- 
sponses of high form level. Intermediate to 
these are Vague Whole responses and medi- 
ocre Detail responses. With the Elgin Prog- 
nostic Scale as a measure of the process-reac- 
tive dimension, Becker’s (1956) hypothesis 
was supported. Separate analyses were made 
of male and female schizophrenics and the 
Elgin scale, and the Rorschach genetic-level 
scores were found to correlate —.599 (p< 
.01) for the men and —.679 (p < .001) for 
the women. More recently, Steffy and Becker 
(1961) replicated this finding using the HIT. 
The correlation between Elgin scores and ge- 
netic level derived from the HIT was —.36 
(P «.05) on a sample of 36 hospitalized 
schizophrenics. When duration of hospitaliza- 
tion was partialed out to control for lower 
Elgin ratings and improved inkblot perform- 
ance (both a function of length of hospitali- 
zation), the correlation rose to —.46. The 
authors concluded that the HIT shows promise 
of producing good measures of degree of 
pathology in schizophrenia. 

Using a different measure of the process- 
reactive continuum, Ullman and Eck (1965) 
extended these findings to male schizophrenics 
who were either discharged from the hospital 
or leaving on a trial visit (V = 48). The 
HIT variables used in this study were V, 7, 
and FA, When V, Z, and FA were added to- 
gether to form an inkblot summary score, the 
Correlation between the process-reactive mea- 
sure and inkblot summary score was found to 
be .47 (p< .001). (The correlation in this 
study is in the positive direction because of 
the scoring procedures used.) These findings 
corroborate those found earlier, but it should 
be noted that this sample was atypical, and 
the relationship between the HIT variables 
and the genetic-levels hypothesis was not dis- 
cussed, although (based on the results of de- 
velopmental studies discussed above) the use 
of Integration and Form Appropriateness 
would indicate that some developmentally 
relevant dimensions were being tapped by 
the inkblot summary score measure. 

Degree of autism. Using a somewhat com- 
plex design, Hill (1966) studied the degree of 
affect aroused by chromatic HIT cards as a 
function of stimulus strength (“‘brilliance” 


rated by 26 hospital staff members) and de- 
gree of autism (^high" and “low” as as- 
sessed by selected scales of the Sixteen Per- 
sonalitv Factor Questionnaire). The depen- 
dent variable in this study, “affect arousal,” 
was operationalized in terms of ratings of 
nonverbal behavior, Holtzman's discarded AA 
scale, and FD. Analysis of variance of the FD 
scores found stimulus strength and degree of 
autism to be significant variables. More spe- 
cifically, FD to strong colors was lower than 
FD to weak colors, and FD was lower for 
high-autistic subjects than for low-autistic 
subjects. On low stimulus-strength cards, the 
high- and low-autistic subjects did not differ. 
With respect to AA scores, strong color cards 
produced greater affect arousal than weak 
cards, and high-autistic subjects scored higher 
than low-autistic subjects in 4.4. Also, degree 
of autism interacted significantly with stimu- 
lus strength (p < .05) indicating that low-to- 
high shifts in stimulus strength had more im- 
pact on high-autistic subjects than on low- 
autistic subjects. The trends in the overt be- 
havior ratings were identical to those with 
respect to AA scores, except that there was no 
difference between the groups under conditions 
of low stimulus strength of color. The effects 
observed in all three dependent variables 
support the conclusion that stimulus strength 
and degree of autism and the interactions be- 
tween these variables are important determin- 
ers of color responses on the HIT. The inter- 
action effects may also be interpreted to mean 
that some HIT cards are more discriminating 
with respect to degree of autism than others. 

Organicit y. Only one study reported the use 
of the HIT to differentiate between normal 
and  brain-damaged individuals (Barnes, 
1964). It is claimed (Holtzman, 1968) that 
Barnes could discriminate these groups with 
80% accuracy. The difficulty with this con- 
tention is the fact that many other measures 
were used besides the HIT in a multivariate 
design by this investigator, and the manner in 
which the study was reported did not allow 
one to assess the precise contribution of the 
HIT to differentiation or whether the discrim- 
inating variables were defined a priori or post 
hoc. More details about this study are needed 
before final evaluation can be made, however, 

Anxiety and neuroticism. Megargee and 
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Swartz (1968) correlated scores on the Ex- 
traversion (E) and Neuroticism (N) scales of 
the Maudsley Personality Inventory (MPI) 
with the 21 scores of the HIT. The 89 sub- 
jects (40 women and 49 men) in the study 
were undergraduates of the University of 
Texas, and group administration procedures 
were used for both the HIT (Swartz & Holtz- 
man, 1963) and the MPI. The resulting 19 
scores (four HIT variables were dropped be- 
cause of extreme skewness) were subjected to 
a principal-axis analysis. Contrary to predic- 
tions, no significant correlations were found 
between HIT C, M, or H and Extraversion. 
Thus it appears that extraversion-introversion 
does not mean the same thing when assessed 
by the HIT and the MPI. (The authors claim 
that their prediction was made on the basis of 
“inkblot lore.”) It is interesting to note, how- 
ever, that Klopfer, Ainsworth, Klopfer, and 
Holt (1954) insist that introversion-extra- 
version is a different concept from inkblot 
derived introversive-extratensive. In view of 
this confirmation of the difference between 
these constructs, it would be interesting to 
determine how they differ in other respects. 
There were several significant correlations be- 
tween MPI Neuroticism and HIT variables: 
R (—24), FA (—23), M (23), V (23), Ax 
(.31), and Hs (.24). Though none of these 
correlations are very high, they are significant 
and all in the expected direction considering 
inkblot hypotheses of neuroticism. The factor 
analysis carried out in this study to deter- 
mine whether any of the HIT factors could 
be interpreted as Extraversion-Introversion or 
Neuroticism, as Eysenck (1965) suggested, 
Showed that Extraversion is relatively inde- 
pendent of the HIT as its loadings on any fac- 
tors even slightly reminiscent of extraversion 
were quite low (.44 and 41). The loadings of 
the Neuroticism scale were similarly low (the 
highest being .49) but appropriately on a 
factor labeled disordered thought processes 
and emotional disturbance, 

Van de Castle and Spicher (1964) useq the 
HIT to investigate the phenomenon of color 
disturbance. High- and low-scoring groups of 
college students on two tests of anxiety and a 
test of neuroticism were administered g chro- 
Matic cards and 15 mixed cards. Subjective 
disturbance Was measured by means of a 16 
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pair-item semantic differential. No evidence 
for color shock was found. In fact, both groups 
rated the chromatic cards in more favorable 
terms than the nonchromatic or mixed cards. 
The authors suggested that the term “novelty 
shock” be substituted for “color shock” as 
disruption in “set”? seemed to mediate disrup- 
tion in test performance, and this could pos- 
sibly come about were an achromatic card 
suddenly to appear after several chromatic 
cards. 

For evidence of the construct validity of 
several HIT variables, a study by Herron 
(1965) provided an interesting diversion from 
the usual approach. Herron studied the pei 
tionship between eyelid conditionability and 
certain HIT factor marker variables presumed 
to measure anxiety and neuroticism. Condi- 
tionability was dichotomized with failures 
defined as less than 15% conditioned ier 
sponses in 60 trials. Using Holtzman et al.'s 
(1961) version of HIT factors, the Neuroti- 
cism factor (4x, V) was found to be posi- 
tively related to conditionability. V correlated 
30 and Ax 36 (p< .05) with conditionabil- 
ity. Holtzman’s Cognitive factor was also post- 
tively related to conditionability (r with Z = 
44; r with H = 22). It is interesting to note 
that one meaning of this cognitive factor 1n- 
volves the notion that a person with high 
scores (/) can ignore irrelevant aspects of a 
stimulus situation in the interest of organizing 
a “good” percept. Similarly, in eyelid condi- 
tioning studies, it has been found that the 
elimination of distracting stimuli significantly 
speeds up the rate of conditioned response 
acquisition (Porter, Engel, Brady, & Kropp; 
1964). Another interesting trend in these data 
was shown by the positive correlation between 
the Sensitivity factor (C = .22 and Sh = .26) 
and conditionability, 

Swartz and Swartz (1968) related the Test 
Anxiety Scale for Children to 11 HIT varia- 
bles using a sample of 120 normal children 
(60 high anxiety and 60 low anxiety) within 
2 weeks of ages 6.7, 9.7, and 12.7 years. The 
four variables found to differ significantly be- 
tween the groups were M (low-anxiety sub- 
Jects giving more), Pn (high-anxiety group 
giving more), and AA (high-anxiety group 
giving more). Two of these differences—M 
and 44—eplicate the findings of an earlier 
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study of anxiety in childreh (Swartz, 1965). 
The findings with respect to J diverge 
sharply from previous studies of anxiety and 
Rorschach responses (Sarason, Davidson, 
Lighthall, & Waite, 1958). 

It is significant that these studies and oth- 
ers (Barger & Sechrest, 1961; Doris, Sarason, 
& Berkowitz, 1963; Holtzman et al., 1961, p. 
180) generally fail to show strong relation- 
ships between questionnaire- and rating-as- 
sessed anxiety and HIT indexes. These and 
other findings have prompted Holtzman et al. 
(1961) to conclude: 


Anxiety and Hostility as scored in the Holtzman 
Inkblot Technique are strictly ratings at a fantasy 
level which are not necessarily related in any simple, 
direct way to overt behavior that is judged to be 
anxious or hostile [pp. 180-181]. 


Further, we would add that it is a task for 
theory to try to conceptualize just in what 
way—if not “simple” and “direct”—fantasy 
productions should relate to self-report and 
overt behavior. 

Alcoholism. Cleveland and Sikes (1966) 
compared 70 hospitalized alcoholics with 50 
nonalcoholics on HIT Br, Pn, Decadence (any 
response involving deterioration), and Water 
responses. Chi-square analysis of the data 
indicated that while the criterion groups did 
not differ on Br scores, alcoholics were signifi- 
cantly higher on Px (p < .02), Decadence (p 
< .001), and Water responses ($ < .001). 
The results on these and other variables gen- 
erally supported the hypothesis that alcohol- 
ics differ from nonalcoholics in the perception 
of their bodies as “dirty, disgusting, and in a 
state of decay” and in the diffuseness of their 
body boundary concepts. It is important to 
call attention to the stability of their per- 
cepts: no change was found when these sub- 
jects were tested again after 90 days of ther- 
apy. With respect to the finding that alcoholics 
gave more water responses, the only thing 
that can be said is that here is one more posi- 
tive finding in a body of literature where both 
positive and negative results seem. equally 
likely. One more thing should be said about 
the structure of this study in relation to diag- 
nostic differentiation. The chi-square analyses 
showed that there were more alcoholics giv- 
ing high numbers of responses in the categor- 
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ies of Pii, Decadence, and Water responses as 
compared to low frequencies tabulated, but 
the frequencies in these categories for non- 
alcoholics were considerable, indicating that, 
had a different design been used, a great deal 
of overlap may have been shown between 
these groups. 

Mayfield (1968) hypothesized that altera- 
tions in inkblot perception would occur under 
acute alcohol intoxication because of the al- 
terations in personality that are assumed to 
result under such conditions. On the basis of 
previous results and theorizing, this researcher 
predicted an increase in C and a decrease in 
FA in an acute alcohol injection condition, as 
compared to a dextrose (placebo) injection 
condition. The subjects were paid, white, male 
volunteers (N = 12) from 26 to 50 years of 
age. Both Forms A and B of the HIT were 
used in a before-after design. No significant 
changes occurred on any HIT variable with 
the exception of C, which increased from 3.3 
to 4.8 in the alcohol condition and decreased 
from 3.9 to 2.2 in the placebo-control condi- 
tion. These results were viewed as in line with 
typical psychodynamic theorizing on the ef- 
fects of intoxication (release of inhibitions, 
impulsivity). 

Miscellaneous diagnostic studies. Moseley, 
Duffey, and Sherman (1963) factor analyzed 
the HIT variables along with scores on an 
Inpatient Multidimensional Psychiatric Scale 
(IMPS) and the Minnesota Multiphasic Per- 
sonality Inventory (MMPI). The subjects 
were 82 Veterans Administration patients who 
had been diagnosed as neurotic depressive 
(46) or psychotic depressive (36), the latter 
category including involutionals (2), manic 
depressives (5), psychotic depressives e, 
and  schizo-affectives (22). Ten factors 
emerged in this study. Three factors were de- 
fined only by HIT variables: Organized Nor- 
mal Ideation (highest loadings on 7, M, and 
H), Scoring Factor (4, L, and negatively by 
R), and Fantasized Hostility with Anxiety 
(Hs, Ax, and M). The HIT was found to have 
little relationship to the MMPI Scales, but 
significant relationships were observed in the 
pattern of loadings when the clinical ratings 
of overt behavior (IMPS) were considered. 
A Withdrawal and Disorientation factor 
showed significant loadings on IMPS Disorien. 
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tation and Grandiose Expansiveness and HIT 
Sx, and V. A Fluctuating Responsiveness to 
Environment factor showed significant load- 
ings on HIT FD (negatively), C, and Sh, and 
IMPS variables Paranoid Projection and Per- 
ceptual Distortion. It is to be noted that all 
of the variables described here had loadings of 
.40 or above with their respective factors. The 
authors concluded that these observed over- 
laps in common variance are meaningful and, 
therefore, support the construct validity of 
the HIT. 

In an attempt to develop measures of so- 
matic concern to be used in psychiatric evalu- 
ation, Endicott and Jortner (1967) related 
ratings of somatic preoccupation obtained 
during interviews to selected HIT indexes of 
this variable, The subjects in this study were 
both inpatients and outpatients manifesting a 
variety of disorders. For the inpatients, the 
correlations between the number of At re- 
sponses and the HIT A¢ score and rated so- 
matic preoccupation were .26 and .27, respec- 
tively. Animate mutilation and blood re- 
sponses on the HIT both correlated .21 with 
rated somatic concern. For the outpatient 
group, no HIT score correlated with somatic 
concern. A single measure of somatic concern, 
derived through a multiple regression equa- 
tion, combining HIT At, animate mutilation, 
blood and death responses, and the MMPI 
Hs scores, was found to correlate .44 and .49 
with rated somatic concern in inpatients and 
outpatients, respectively. 

] Conners (1965) found significant differences 
in individual HIT variables and in factor 
Scores when the HIT was administered to 
disturbed children receiving outpatient care 
and to a control group of nonclinic children. 
The disturbed children (neuroses and conduct 
disorders) received higher scores on R, Ab, 
vr P on all other variables except 

> 93, Ab, Hs, Pn, and B. Using HIT factor 
Scores, this researcher also found that the neu- 
egeo pa more Dep differen- 

E primarily) and more inhi- 
bition (RT, R, and A negatively) than the 
conduct problem children. This picture seems 

Quite consistent with usual descriptions of 
Pee rn personality problem children, 
ni (1968) a clinical normative effort, Mor- 

provided data on four clinic sam- 
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ples on all 22 HIT variables and compared 
the results to the appropriate validation sam- b 
ple in Holtzman et al. (1961). The only sig- La 
nificant difference (p < .05) found between 
an elementary school sample and the appro- 
priate comparison group was on V (clinic 
sample mean = 12.67; standardization sam- 
ple mean = 5.39). The clinic sample of sev- 
enth-ninth graders differed significantly from 
the norms on A (clinic mean = 25.46; stan- 
dardization sample mean = 19.23) and M ' 
(clinic mean = 36.76; standardization sample 9 
mean — 24.37) and these same differences 
held for a clinic group of tenth-twelfth grad- 
ers. The only variable showing a significant = 
difference from the norms in the college-clinic 
sample was / (clinic group mean — 7.22; 
standardization group mean — 11.08). This 
study does not seem to provide strong support 
for the clinical usefulness of the HIT, but 
several shortcomings of the study should be l 
noted. First, the degree of pathology in the | 
"clinic" sample is questionable since none of 
the subjects had ever been institutionalized 
for psychiatric reasons. Second, one can al- j 
ways question the suitability of a standardi- 
zation sample as a reference group for COn 
parison purposes. There is no guarantee that 
Holtzman's groups were comparable to Mor- f 
gan's (1968) in every important respect, eve?  * 
though they were normal and from the same | 
general socioeconomic level. . | 
Krippner (1967) related the reading "Y 
provement of 24 second-sixth graders On pa 
22 HIT variables. Reading improvement jii 
assessed by differences between scores a 
tained on alternate forms of the Californi” j 
Reading Test administered at the beginnin® 
and end of a 5-week reading clinic experient™ 
Four HIT variables correlated significant? 
(O1 level) with reading improvement: j 
(.57), Sh (—.60), V (—.96), and Hs p 
Since age was not controlled in this studY: 


rjiouslY 
these correlations are probably spuro 
high. 


Dy 
Discussion AND CONCLUSIONS 


The past 10 years have yielded a pae 
tial body of literature relative to the vali ob 
of the HIT. As the research stands, it 15 pr be 
ably clear that no definite answer CÓ" than 
given to the question “Is the HIT better 


HOLTZMAN INKBLOT TECHNIQUE 


the Rorschach?" With reference to the formal 
aspects of the HIT—the basic structure of 
the instrument, the elaborate reference group 
norms provided, parallel forms—we can agree 
with Forer's (1965) review in finding the HIT 
a very attractive research instrument, and, 
with regard to the latter, many of the person- 
ality and developmental studies reviewed here 
would bear us out. Still, to properly judge the 
general advantages of the HIT over the Ror- 
schach, many more direct comparative studies 
are needed. 

One can agree with Eysenck's (1965) view 
that a disproportionate emphasis on reliability 
as compared to validity is found in the initial 
construction of the HIT. A more serious at- 
tack, however, could be leveled at the lack of 
concern, or rather, the superficial concern for 
the conceptual validity of the instrument 
shown in the introductory. HIT monograph 
(Holtzman et al., 1961). This is not to agree 
with Eysenck's (1965) condemnation that va- 
lidity lags so much behind reliability that it 
"demonstrates pretty conclusively that the 
underlying notion of the Rorschach test is at 
fault [p. 217]." This statement, besides being 
illogical, ignores the issues involved. First of 
all, it can be seriously doubted that anyone 
knows for certain what the "underlying no- 
tion" of the Rorschach or the HIT is or ought 
to be. Second, it is questionable that 10 years 
of research with any instrument, no matter 
how similar to the Rorschach, can have any- 
thing to say about the validity of the theory 
behind the technique. This is especially true 
when, as has been the case, many researchers 
ignore such basic questions as, How should 
this variable relate to normality-pathology, to 
dynamics, to overt behavior, to fantasy? 

Deficiencies in the designs of specific stud- 
ies and lacunae in the general areas of re- 
search covered by this review have been dis- 
cussed elsewhere in this article. It will suffice 
to point out here that with the possible excep- 
tions of developmental studies and those 
dealing with technological extensions of the 
HIT, many more studies will be needed in all 
areas before empirical generalizations concern- 
ing this instrument's validity can be offered 
with any confidence. This is especially true 
when questions are raised concerning the di- 
agnostic efficiency of the HIT in comparison 
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to that shown by the Rorschach. Besides the 
dearth of studies of differential diagnosis us- 
ing the HIT, there are even fewer studies re- 
ported in which direct comparisons with the 
Rorschach are effected. 

The scarcity of relevant studies should not 
detract from the diagnostic potential shown 
by the HIT as most of the well-designed stud- 
ies reviewed have produced positive findings, 
Moseley’s (1963) discriminant function analy- 
sis (no doubt greatly facilitated by the psy- 
chometric characteristics of the HIT) 
achieved a remarkable degree of diagnostic 
differentiation. His weights should be further 
cross-validated, and his approach might serve 
as a model for future research involving a 
wider range of nosological groupings. To be 
sure, this should not be the only approach to 
research on diagnostic differentiation. Previ- 
ous critiques of research with the Rorschach 
by Schafer (1948), Ainsworth (1954), and 
Holt (1968) have raised important questions 
concerning the relevance of studies that have 
been content to relate individual test scores to 
diagnostic comparison groups. Research vali- 
dation of these individual scores would, of 
course, be welcome. But if it is true, as Schafer 
(1948) and Holt (1968) have asserted, that 
the clinical use of inkblot perception and other 
tests involves primarily the description of a 
subject’s personality and behavior in psycho- 
logical terms, and only secondarily his as- 
signment to a diagnostic category, then the 
peripheral relevance of score validation to 
the validity of test interpretation becomes 
clear, Another issue that confronts the score 
versus diagnostic category approach is based 
on the observation, frequently voiced by psy- 
chologists and psychiatrists, that protocols of 
“normal” subjects show a surprisingly high 
degree of pathology. Needless to say, the claim 
that the test is at fault when this occurs can 
easily be countered by the assertions that 
there is something wrong with nosological 
schemes, with relating test scores alone to 
these schemes, or both of the above. 

It can be hoped that future research with 
the HIT will address itself in some way to 
the issues of personality description versus di- 
agnostic classification and test interpretation 
versus test score. One solution to these issues 
might be to pay more attention to nonnoso- 
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logical criteria (e.g., behavioral rating scales, 
conditionability scores). To some extent this 
has been done, and it is interesting to note 
that in the studies reviewed here, some of the 
strongest and most consistent findings were 
Observed in studies where HIT variables were 
related to such nonnosological dimensions as 
cognitive development, aggression, the process- 
reactive distinction, and body image. 
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DIRECTIONAL STATISTICAL HYPOTHESES AND 
COMPARISONS AMONG MEANS 


JULIET POPPER SHAFFER ! 


University of Kansas 


A reformulation of the rationale for the ¢ test of the difference between two means, 
similar to but distinct from that proposed by Kaiser, is offered. Basically, the tra- 
ditional two-sided formulation, with the null hypothesis 4; = yə versus the alter- 
native hypothesis 4; # us, is replaced by a simultaneous test of two one-sided 
hypotheses: gi S u: versus ui > us; and u 2 us versus g; < us. Two specific 
issues are discussed in the context of this reformulation, and it is shown that a more 
satisfactory treatment of these issues is possible than under the usual formulation. 
"The issues are (a) symmetry of the null and alternative hypotheses and (b) the sizes 
of the two rejection regions. A possible extension of the formulation to encompass 
multiple comparisons of more than two means is offered. 


The classical formulation of the hypothesis 
tested in the two-sided / test for comparing 
the means of two groups is the null hypothesis 
Ho: wy = p (or pı — we = 0), where ui and us 
are the means of the two groups, versus the 
alternative hypothesis Ha: u1 7* Me (or u1 — Be 
= 0). Both the null and alternative hypotheses 
are formulated in the context of a more general 
model that specifies independent observations 
from normal distributions with equal variances ; 
this general model is assumed throughout the 
article. The test consists of dividing the set of 
possible outcomes into those which would lead 
to the decision to reject Ho in favor of Ha (the 
critical region or region of rejection) and those 
that would result in acceptance of Mo. The 
probability of the outcome being in the region 
of rejection, given that Zo is true, is called the 
size of the test and is denoted by a; it is also 
the probability of making a Type Terror, that 
is, of rejecting Mo when it is true. (When the 
probability of a Type I error varies for different 
parameter values satisfying the null hypothesis, 
the size of the test, a, is generally defined as 
the maximum probability of a Type I error.) 

Kaiser (1960) pointed out that the tradi- 
tional two-sided / test formulated in the above 
manner is of little relevance in most practical 
applications. While it may be the appropriate 
hypothesis for some tests of exact quantitative 
models, usually, when the null hypothesis of 
no difference is rejected, the experimenter 
wishes to state that there is a difference ina 
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specific direction, while the alternative hy- 
pothesis is formulated in such a way that he 
can state only that a difference exists. Kaiser 
reformulated the test as a three-decision 
procedure, where the decision is to accept 
either the null hypothesis Ho, the alternative 
hypothesis ui < us, or the alternative hy- 
pothesis 41 > us. Separate critical regions are 
defined for accepting each of the two direc- 
tional alternative hypotheses. If these are set 
equal in size, and each equal to 3a, the test is 
identical to the usual / test, but rejection of 
the null hypothesis implies acceptance of a 
difference between the means in a specified 
direction. Kaiser also pointed out that when 
formulated in this way, in addition to the usual 
Type I and Type II errors, one can consider 
a Type III error, defined as rejecting the null 
hypothesis correctly but choosing the wrong 
alternative, for instance, deciding that ui > us 
when in fact u1 < po. 

Another formulation of this test, briefly 
mentioned as an alternative possibility by 
Kaiser, is pursued in this article, as it provides 
a different point of view that helps to clarify 
a number of issues connected with the use of 
the test. The two-sided ! test is reformulated 
as a simultaneous test of two directional 
hypotheses: 


Wo: wi € us versus Wai pi > us 


and 


oo: ur > y2 versus Ha: pa < po. 


Each hypothesis can be considered independ- 
ently, or they can be treated jointly in a 
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generalization of the multiple comparisons 
paradigm, with two comparisons tested for a 
single pair of means. As no three-decision rules 
are required, the large body of theory on 
hypothesis testing, dealing with a null hy- 
pothesis versus an alternative hypothesis, can 
be applied. A one-sided / test is appropriate 
for each of the individual hypotheses. The size 
of a [ test with this type of null hypothesis is 
defined as the maximum probability of a Type 
I error as noted above; this maximum is 
attained when pi = po. 

Two specific issues are discussed in the 
context of this reformulation: symmetry of 
the null and alternative hypotheses and the 
sizes of the two rejection regions from the 
point of view of multiple comparisons theory. 
In addition, an extension of the formulation to 
the case of more than two means is considered. 

1. Much criticism has been leveled against 
the formulation of null hypotheses as single 
values of functions of parameters (e.g., 
M1 — m = 0), the argument being that any 
exact value must be false in all (or almost all) 
populations (see, e.g., Bakan, 1966; Edwards, 
Lindman, & Savage, 1963). In the present 
formulation, each null hypothesis is comparable 
to its alternative, so the simultaneous hy- 
potheses are more satisfactory in this respect 
than the original. 

2. Two approaches to the question of per- 
forming a simultaneous test can be taken: each 
hypothesis can be considered separately, with 
each tested using a one-tailed / test at a tradi- 
tional level of significance, .05 or -01, or the 
tests can be considered Simultaneously, with 
the overall Type I probability error rate (i.e., 
the probability that at least one null hypothesis 
will be rejected, given that both are true— 
Miller, 1966, p. 6) set at a conventional level, 
as in multiple comparisons approaches. The 
rationale for the multiple comparisons ap- 
proach is particularly compelling in this 
Instance of maximum interdependence of the 
two tests. The overall T: ype I probability error 
rate is easily determined in this case: the 
probability of rejecting at least one of the 
hypotheses, given that both are true, is the sum 
of the maximum Type I error probabilities 
that is, the sum of the sizes of the two tests, 
poer Words, to set the probability error 

€qual to o, the size of the test of the first 
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null hypothesis should be set equal to ai, and 
that of the second equal to a», with the restric- 
tion that a1 + a» = a. In general, for a sym- 
metric simultaneous test of the two hypotheses, 
o3 and a» should each be set equal to $a; this 
gives rejection regions equivalent to E 
rejection region of the traditional two-sided 
test. A one-sided test results when either e 
or a» is set equal to a, while the other is set 
equal to zero; this is equivalent to testing 
only one of the two simultaneous brune. 
Intermediate cases, wherea: and az are unequa 
but neither is equal to zero, are equivalent Ls 
treating the two tests asymmetrically, in 
sense that the probability of correctly pent 
illo if ay — a» were equal to c would not ns 
the same as the probability of correctly pad 
ing oH» if a1 — a» were equal to —¢, po 
c is a positive constant; in other words, a 
two tests would have unequal power e vi 
respect to alternative hypotheses ges 
in magnitude. This might be desirable ! bs 
were more important to detect differences dE 
one direction than in the other, or if a i-o 
in one direction were considered more i 
than in the other. Unfortunately, it 18 pe 
possible to quantify these considerations 50 * 

to lead to a definitive choice of a1 and az- : 

Tn the context of a simultaneous test of E 
hypotheses, Kaiser's Type III error 1$ io ad 
alent to two errors, a Type I error with ep 
to one test and a Type II error with respect 4 
the other. In other words, if, in fact, #1 S i 
and we conclude that ui > we (rejecting 2 
pothesis ;//g and accepting hypothesis P 
we are making a Type I error with Tesper b 
the first set of hypotheses (falsely p pid. n 
null hypothesis) and a Type II error car 
respect to the second set (falsely accep exe 
the null hypothesis). In a symmetrical eps 
taneous test, the maximum probability © ee 
type of error is Ja, in a one-tailed test it E B 
while in an asymmetric simultaneous m os 
would be between these two limiting va ae 
Thus, the symmetrical test, within cos eee 
of simultaneous tests, minimizes the maxim? 
probability of a Type III error. 

The increased specificity of the oon 
hypotheses as compared with the non ein 
tional ¢ test comes at the price of a d 
power. While the power of the nondirechn: iy 
L test is always greater than œ, approac 
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the minimum value œ as xı — #2 approaches 
zero, the power of the symmetric simultaneous 
procedure (ie, the probability of rejecting 
the false null hypotheses) approaches a 
minimum of la as ui— “2 approaches zero, 
and is always less than the power of the non- 
directional test, with the difference between 
them approaching zero as the magnitude of the 
difference between the means increases. (The 
difference between them is equal to the proba- 
bility of the Type III error; see Kaiser, 1960.) 
The much greater relevance of the directional 
conclusion in most contexts more than compen- 
sates for this disadvantage of the proposed 
simultancous test. 

It should be noted that all points made thus 
far can easily be extended to the case where the 
traditional hypothesis is ui — 4s = d; where a 
is a specified constant, and the alternative is 
ii us ¥ a. Similarly, the reasoning applies 
in the case of a single population where Ho 
would normally be formulated: u = b, where 
b is a specified constant, and Ha would be 
nu z b. 

The directional formulation should also be 
valuable in testing differences and other 
contrasts among more than two means. A full 
development of this topic is well beyond the 
Scope of this article; however, a possible 
formulation of such an extension for the case of 
three means, where only differences among 
means are of interest, is proposed. 

Given samples from three populations, with 
Means ji, uo and ys, the usual multiple com- 
parisons tests of all differences between pairs 
of means might be formulated as the simul- 
taneous testing of three hypotheses : 


illo: gi = gs versus 1Ha: ua F us 
„Ho: p = pa versus ala: pi 7 ua 
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3Ho: u» = us versus sH 


si 
Replacing these by directional hypotheses, the 
procedure could be represented as the simul- 
taneous testing of six hypotheses: 
io: pı € me versus 1Ha: 
aH: 
allo: 
Ho: ui 2 ps versus Ha: 
sHo: 


Mi > ua 


ui Z us versus oHa: ui < pe 


ui € us versus ¿Ha: ui > us 


ui X us 
Lu» € us versus 5Ha: us > us 
and 


clo: us > us versus oHa: p» < pus. 


The usual multiple comparisons tests, for 
instance, the studentized range tests (Miller, 
1966), by symmetry, obviously give equal 
weight to all six hypotheses. The problem of 
devising such tests with unequal weighting 
of the hypotheses, especially within the pairs of 
directional hypotheses, is a formidable one. 

Finally, the points made above are not 
restricted to situations appropriate for ¢ 
tests or even to considerations only of mean 
differences. 
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CORTICAL LESIONS AND AUDITORY 


DISCRIMINATION* 
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Wayne State University University oj Illinois 


Studies investigating the effects of lesions of the auditory cortex upon auditory 
discriminations are reviewed. Discriminations studied include frequency, inten- 
sity, duration and other temporal cues, complex spectral differences, and ouo 
in temporal patterning. Factors determining the effectiveness of lesions are size f 
and completeness of lesion, whether the lesion involved one or both hemi- 

spheres, nature of the testing procedure, size of the signal differences to be ru 
discriminated, and nature oí the discrimination. In view of the numerous factors, 

comparison of different studies is often difficult because of confounding. In 

terms of the factors listed above, (a) patterning changes, in which signals are 

not changed but merely rearranged in order of presentation, suffer more than 

do tasks involving the detection of new signals or the recognition of different 

signals; and (b) discrimination tasks requiring recognition suffer more than 

do tasks requiring only the detection of a new signal. It appears probable 

that the nature of the discrimination task interacts with the locus of the lesion 


and that failures on different types of tasks reflect different deficits. 


While the use of behavioral testing of 
auditory functioning has provided indexes of 
what an animal “hears,” such data are often 
quite difficult to interpret. There are many 
problems in attempting to understand the 
effects of lesions of the auditory system. First, 
one is faced with the problem of producing 
equivalent lesions—to the extent his skill 
allows—and then in determining whether 
lesions do differ among his animals. Second, 
the testing procedure may produce differences 
in performance, Such factors as the auditory 
dimension along which discrimination is to be 
measured, the nature of the discrimination 
(relative or absolute), and the testing proce- 
dure itself must be taken into account, Thus, 
if one is concerned with the effect of a par- 
ticular lesion upon hearing, one is faced with 
the problem of comparing data obtained from 
animals with lesions that may differ, with 
species differences, and with testing differ- 
ences. Apparent disagreements among the vari- 
ous studies are not surprising. 

Another consideration is the general nature 
of the experimental question posed. One may 
be concerned either with the effects of vary- 

1 This stud a i i 
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ing the lesion or with the effects of. varying 
the nature of the discrimination to be made 
In most of the studies that have been carriec 
out to date, the first type of question has been 
posed, with only a single type of discrimina- 
tion studied. However, if one is Lage 
with the second type of question—the s 
to which the same lesions will interfere ee 
different types of discrimination—then 4 dit- 
ferent experimental strategy is called t= 
one where each animal is tested on a battery 
of discrimination tasks. With such a proce 
dure, each animal can serve as his own COP 
trol, and variance due to animal differencen 
lesion differences, and procedural differences 
is largely eliminated. While cumulative diens 
due to testing experience, aging, and P 
term recovery from lesion effects may ay 
affect the results, these factors are ca 
relatively unimportant as contrasted with 
others. 


BEHAVIORAL TESTING PROCEDURES it 
When one reviews the various studies» 
becomes evident that testing procedures p? 
of critical importance. For example, ume 
of investigators concluded that cortical le 
do interfere with frequency discrimina i ar 
while another set concluded that apt 
lesions do not—the different conclusions “ait 
ing from the animals’ performances on 


198 


CORTEX AND AUDITORY DISCRIMINATION 


199 


NATURE OF TASK INTERTRIAL INTERVAL TRIAL 

DETECTION OF AES 

SUE xd hh Sore Shae eem I ER TAE i LI 
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(Negative Stimulus ——iPositive a 
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RECOGNITION 
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{ (Bern (Bell) (Bei (Dur 


Fic. 1. Diagrams illustrating various discrimination-testing procedures. Tasks 1-7 are single- 
response detection tasks in which all trials consist of positive stimuli, Tasks 8-12 are recognition 


tasks involving either go, 


no-go, or two-alternative, forced-choice response contingencies. Tasks 


1 and 2 show the presentation of higher-frequency tone pulses during the trial, either alternating 


with the neutral intertrial 
Task 3 has silent intertria 


pulses (Task 1) or completely replacing the neutral pulses (Task 2); 
interval and a series of negative tone pulses that change to a higher- 


frequency (positive) pulse. Tasks 4 and 5 involve a reordering of the neutral tone pulses, Task 6 


the dropout of one of the neutral tone pulses, 


and Task 7 the shortening of the neutral pulses. 


Task S involves not only the detection of a change but the recognition of the direction of the 


change. Tasks 9 and 10 


involve absolute recognition of the tones. Task 11 requires the recognition 


of the changing or unchanging nature of the auditory input. Task 12 requires the recognition of 


different combinations © 


negative trials. Note: For illustrative purposes only, 


most of the test procedures. 


ferent types of frequency-discrimination tasks. 
Consequently, before reviewing the discrimi- 
nation studies, it is important that test pro- 
cedures be considered in some detail. 

Figure 1 illustrates most of the procedures 
that have been used in studying auditory dis- 
crimination, These procedures are first de- 
scribed briefly, and then their differences are 
considered. One faces a problem in attempting 
to label these different test procedures, and 
it has not been possible to come up with labels 
that completely describe the different task 
requirements. However, for the sake of having 


f auditory input, the same components occurring in both positive and 


frequency changes have been used to illustrate 


descriptive terms to use in referring to the 
procedures and realizing that they are only 
partially adequate, one might differentiate be- 
tween tasks requiring that the animal “detect” 
a change in the auditory input and tasks 
requiring that the animal "recognize" (“iden- 
tify"), in some more less absolute fashion 
different auditory stimuli. The task of detect. 
ing a change obviously requires some ongoing 
auditory input that is altered in some fashion 
(This definition includes the detection of à 
weak tone in the presence of a silent back- 
ground, though we are not primarily con- 
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cerned here with thresholds of audition.) 
Tasks 1-7 in Figure 1 are detection tasks. 
Within this category, however, one can dif- 
ferentiate between those tasks requiring the 
animal to detect a mew tone, differing in fre- 
quency, intensity, etc., from the neutral back- 
ground tones (Tasks 1-3) and those tasks 
requiring the animal to detect a change in the 
manner in which the neutral background 
tone(s) are changed, though no new tones are 
introduced (Tasks 4-7). 

Tasks 8-12 are recognition tasks. Task 8 is 
similar to Task 1 in that the animal is re- 
quired to detect a change; however, it is also 
required to recognize the direction of the 
change. Tasks 9 and 10 are "absolute" dis- 
crimination tasks. Task 11 requires the recog- 
nition of constant or varying inputs. Task 12 
requires the comparison of tones, which, 
depending upon how they are paired, may 
either make up positive or negative stimuli— 
the same components appearing in both posi- 
tive and negative trials. 

While all of these tasks are discrimination 
tasks, it is clear that they differ widely, and 
they certainly are not equally difficult. It is 
therefore useful to consider some of the major 
differences among the various tasks. The sig- 
nificant variations in testing appear to be 
the following: 

1. The nature of the intertrial interval. The 
length of the intertrial interval is one impor- 
tant consideration. However, possibly a more 
important consideration is whether or not the 
intertrial interval is silent or is filled with 
continuously repeated “neutral” stimuli. 
When the intertrial interval is filled with 
neutral stimuli, they form an auditory back- 
ground in the presence of which the introduc- 
tion of new tones is to be detected. This is a 

single-response” procedure. With this pro- 
cedure, of course, the intertrial interval’s 
poner bg randomly in order to elimi- 
relative otk pe - n Ke giire long, 
Es te m ength, in order to keep 

Ae Spontaneous) responses to a 
minimum. Tf intertrial intervals are silent 
ial d "EO, no-go” or a “two-alternative 

“choice” procedure can be used to test 


for recognition. In the first case, positive 


stimuli require a response, and negative stim- 


uli 
i do not. In the Second, a response is re- 
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quired on each trial, different responses being 
associated with different auditory stimuli. 
Probably, task difficulty is greater for silent 
than for filled intertrial intervals, since, if 
the intertrial interval contains neutral stim- 
uli, the animal’s task becomes one of merely 
detecting the change from the neutral to ie. 
positive stimuli. When the intertrial interva 
is silent, memory, as well as discrimination 
ability, is involved in selecting the Ll 
response, and the task becomes one o 
recognition. E 
2. The number of alternative responses 
available to the animals. From the preceding 
discussion, it is clear that the typical are 
procedures can be classified as those requir’ 
a single response (or its inhibition) is 
multiple responses with a separate respons 
associated with each type of auditory sign 
(the simplest case being a two-alternativ®, 
forced-choice). Several studies have beer 
carried out with dogs in which performance 
on a go, no-go task and on a two-alternativ®, 
forced-choice task (either go-right, go-left E 
responses with left-foot, right-foot) was ae 
pared for two general types of discrimination? 
namely, cues based on direction of the signa 
source and cues based on spectral difference? 
of the signal. In addition, the relative pe 
cies of the spectral and directional cues bon 
compared by combining them to determin” 
which cue was more effective in controlling 
the discrimination performance (Dobrzecka o; 
Konorski, 1967, 1968; Lawicka, | d 
Szwejkowska, 1965, 1967). An interaction " i 
found between the number of responses à" A - 
able to the animal and the nature of the ra 
Spectral cues were found to be more aie V 
in the acquisition of go, no-go discrimina. e 
sponses and to be more potent than wi 
tional cues when the two types of cues er 
presented in various combinations. On tb 
other hand, directional cues were ppt 
be more effective in the acquisition of "o 
alternative, forced-choice discriminal fs pns x 
and to be more potent than spectral eam 
when the two types of cues were pIe507 7 
in various combinations. The superiority f 
the directional cues in tasks invalt ne ei 
sponses to the right or left is probab 3 il 
plained by the relatively direct conne te 
between differential kinesthetic cues a550 J 
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with orienting responses to the different signal 
locations and the kinesthetic cues associated 
with two differing responses. On the other 
hand, it is somewhat more difficult to under- 
stand why spectral cues should be superior in 
go, no-go tasks. Lawicka (1969) offered a pos- 
sible explanation of this difference. 

Aside from possible effects of different 
numbers of alternative responses upon the 
acquisition of different types of auditory dis- 
criminations, it is useful to consider the 
numbers of alternative responses in order to 
describe some of the commonly used testing 
procedures. Most behavioral studies to date 
have used a single-response procedure. If the 
intertrial interval is filled with a background 
auditory stimulus, the animal remains at rest, 
and one might well consider this merely pas- 
sive immobility rather than active inhibition. 
In such a situation, the presentation of a new 
auditory signal calls for a response to indicate 
the animal’s detection of the change in audi- 
tory input. Thus, the procedure is termed 
single response and the stimuli filling the 
intertrial interval "neutral rather than 
“negative.” ; 

However, if the intertrial interval is silent, 
the animal is required, upon the presentation 
of an auditory signal, either to make the re- 
sponse or inhibit it. This situation might 
properly be termed go, no-go, and the stimu- 
lus to which no response is made, “negative.” 
Evidence suggests that a go, no-go presenta- 
tion may be more difficult to acquire and 
more likely to suffer from central lesions 
(Thompson, 1960). : 

In many respects, the two-alternative, 
forced-choice procedure is similar to the go, 
no-go procedure. In both cases, the intertrial 
interval is silent and the trial well delineated. 
The animal, upon trial onset, is required to 
make a decision as to the response. 

3. Nature of the discrimination. When the 
intertrial intervals are filled with neutral 
stimuli, it is probable that response to the 
positive stimulus involves merely the detection 
of the change from the neutral to the positive 
stimulus, and the discrimination is relative, 
rather than absolute? With this method of 


3]It is conceivable that the animal might develop 
absolute discrimination in such situations. If one 
were concerned with the question of whether the 
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stimulus presentation, one should differentiate 
between the task in which a new stimulus 
value occurs as the positive stimulus (Tasks 
1 and 2, Figure 1) and the situation in which 
no new tones are presented. In this latter 
case, the neutral stimulus may consist of 
alternating auditory values and the positive 
stimulus either as a rearrangement of the 
order of presentation, that is, a “patterning 
change" (Tasks 4 and 5, Figure 1), or as 
a dropout of one of the neutral stimulus com- 
ponents (Task 6, Figure 1). It is also possible 
to present a positive stimulus by altering 
only its duration (Task 7, Figure 1). It is 
clear that the detection of new stimulus 
values is an easier task than detection of a 
change in order, the elimination of one of the 
neutral stimulus components, or a change in 
tone duration (Diamond, Goldberg, & Neff, 
1962; Diamond & Neff, 1953, 1957; Schar- 
lock, Neff, & Strominger, 1965). 

When intertrial intervals are silent, a recog- 
nition task is involved; the recognition, how- 
ever, may involve either absolute or relative 
discriminations. For example, a single fixed 
tone value may be presented at each trial, and 
the animal required to decide whether the 
tone presented on any trial is, for instance, 
“high” or “low” (Task 9, Figure 1). If inter- 
trial interval durations are of significant 
length, such a procedure requires absolute 
discrimination. On the other hand, a trial 
can consist of a tone or a series of tones 
which, after varying lengths of time following 
the trial onset, changes to a different value 
(Task 3, Figure 1). Here, the animal’s task 
is to detect the change, and thus it would 
appear to involve a relative discrimination. 

4. Nature of reinforcement. A great deal of 
concern has properly been shown by persons 
concerned with learning as to the relative ef- 
fects of punishment and reward, and a great 
deal of data has been collected. However, 
within the limited range of concern of studies 
of animal auditory discrimination, parametric 
studies are lacking that make it possible to 
make precise statements of, for example, when 


animal was making a relative discrimination — 
detecting a change—or an absolute discrimination 
then varying neutral and/or positive stimulus values 
might be used from trial to trial in order to elimi- 
nate the constant aspect of the discrimination, 
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punishment is preferable to reward or what 
the optimal intensity of the punishment or 
reward is. Certainly, one must recognize that 
punishment may be so severe or frequent 
that it may interfere with the acquisition of 
a difficult discrimination, but, beyond this 
cautionary observation, decisions concerning 
the nature and strength of the reinforcement 
must still be based on the experimenter's 
biases and/or his evaluation of the animal's 
responses. 

In addition to this problem, some com- 
ments should be made concerning the patterns 
of reinforcement that may be used in the 
single-response or go, no-go procedures. Two 
patterns of reinforcement may be used—sym- 
metrical and asymmetrical. The asymmetrical 
procedure has been more commonly used and 
involves, in the case of food, the awarding of 
food for making a response to the positive 
stimulus, and, in the case of shock, presenta- 
tion of shock for failure to make the response 
to the positive stimulus. Thus, with asym- 
metrical reinforcement, a response bias is 
likely to develop in the direction of a tend- 
ency to make the response that leads to 
reward or avoids punishment. If care is taken, 
such bias is not severe enough to interfere 
with evaluating the animal's responses. How- 
ever, if lesions decrease the animal's auditory 
discrimination ability sufficiently, or if 
reinforcement is strong enough or frequent 
enough, then the response bias may show up 
very strongly in a tendency, in the go, no-go 
situation, nearly always to make the response 
to both positive and negative stimuli; and, in 
the single-response stiuation, to make many 
Spontaneous responses” during the presenta- 
tion of the intertrial neutral stimuli. To some 
unknown extent, then, the response bias re- 
sulting from asymmetrical reinforcement in 
such test situations may enter to cloud the 
interpretation of the effect of lesions upon 
discrimination. For example, following a 
cortical lesion, the animal invariably responds 
to both positive and negative stimuli in a go, 
no-go situation, and the question arises as to 
what extent this behavior is a reflection of a 
generalized alteration 


in the animals re- 
Sponse tendencies and to what extent it 


reflects a decrement in the animal’s discrimi- 
nation abilities. (These questions, of course, 
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are similar to those faced by the experimenter 
in human psychophysics, where the “response 
criterion” is often a significant factor.) To 
some extent, this criterion difficulty can be 
reduced by using symmetrical reinforcement 
with single-response and go, no-go procedures 
or by the use of mock trials to assess response 
bias in the single-response procedure. Another 
way to overcome the problem is the use of 
the two-alternative, forced-choice procedure 
in which both responses can be reinforced 
appropriately and equally. In many respects, 
this latter procedure appears the most appro- 
priate for handling the criterion problem, since 
with single response or go, no-go procedures, 
even with symmetrical reinforcement, there 
may be inherent differences in the willingness 
to make or withhold responses. 


REVIEW or BEHAVIORAL FINDINGS 


In seeking to measure hearing ability; tests 
of absolute sensitivity as well as tests of dif- 
ferential sensitivity have been used. Mea- 
sures of absolute sensitivity involve merely 
the detection of the onset of an auditory 
input. As such, relatively little information 
needs to be processed, and, so long as tne 
receptor cells are not affected, relatively sil 
vere central and peripheral neural lesions may 
have little if any effect on such performance 
(Elliott, 1967; Neff, 1960). Neff noted ben 
in the cat, complete bilateral ablation of e 
known cortical auditory areas, and some areas 
extending beyond them, does not eliminate 
ability to respond to the onset of -— 
Elliott reported that up to 90% of first-OT P 
fibers may be destroyed with little pec 
of the audiogram. Quite clearly, then, beha " 
ioral measures of absolute acuity may eA 
quite insensitive as indexes of the apt 
of the auditory system. Because of this E 
increasing number of behavior studies hay” 
concerned themselyes with auditory discrimi 
nation abilities. Such studies are now s 
viewed, but, in view of the fact that pud 
of them have used single-response procedi 
which involve only the detection of a a? 
in auditory input, one should not be ry 
surprised if major alterations of the auditor 
system may result in little if any decreme 
in the measured auditory functioning. 


3 pon 
In evaluating the effects of lesions "F 
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auditory discrimination, one can order the 
severity of the lesion's effect as follows: 

1. No effect: postoperative discrimination 
behavior is unaffected. 

2. Small effect: temporary amnesia from 
which the animal spontaneously recovers over 
a period of a few days or weeks. 

3. Moderate effect: initial amnesia which, 
with postoperative training, is overcome, 
sometimes more quickly, sometimes at the 
same rate as the original, preoperative 
learning. 

4. Severe effect: amnesia, which can only 
be overcome with extended training or which 
can only be partially overcome—with the ani- 
mal displaying poorer-than-normal discrimi- 
nation ability even after extensive training. 

5. Complete interference: no postopera- 
tion discrimination whatsoever is evidenced. 

In addition to these changes, another altera- 
tion in performance may be evident. This is 
an increase in the variability of performance. 
Thus, postoperative behavior may show large 
fluctuation from session to session (or within 
Sessions) in the level of discrimination per- 
formance. Since the animal may sometimes 
display satisfactory discrimination perform- 
ance, it is possible that this type of incon- 
Sistent performance may reflect higher-order 
changes than disruption of its sensory proc- 
essing centers. This is a matter of great 
interest, though not well understood. 

In the following sections, research is re- 
viewed in terms of the nature of the discrimi- 
nation to be made. After this, generalizations 
across the various types of discrimination are 
made. Since the various studies reported have 
involved central ablations of the cat or 
monkey most frequently, and since references 
are made to ablations of various size and 
location, maps of cat and monkey cortex are 
presented in Figure 2. 


Frequency Discrimination 

There have been more studies of frequency 
discrimination than of any type of discrimina- 
tion. Some have revealed permanent inter- 
ference by cortical ablations, some temporary 
interference, and some little if any interfer- 
ence, In part, these differences have reflected 
difference in the location and extensiveness 


Top: cat cortex; middle and bottom: 


sensory association areas: PCA, ALA, AMSA, PMSA 
(see Thompson, Johnson, & Hootes, 1963). (Direct 
auditory projection area of the monkey lies buried 
in the sylvian fissure, primarily on the superior sur- 
face of the superior temporal gyrus. Medial genicu- 
late radiations have been found to extend primarily 
to the portion posterior to the central sulcus [see 
middle diagram]; however, ablation of the anterior 
portion of the superior surface causes retrograde 
degeneration of the posterior pole of the principal 
division of the medial geniculate [Akert, Woolsey 
Diamond, & Neff, 1959]. In the bottom diagram, 
the sylvian fissure is opened up to show thé area 
from which evoked potentials have been recorded.) 


of the lesion; in part, they have reflected 
differences in the testing procedures utilized 

Allen (1943, 1945) used a bell and iron 
cup tapped once per second, with fundamen- 
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tal frequencies of, respectively, 2600 and 720 
Hertz (Hz.) in testing dogs for absolute 
recognition of pitches (Tasks 9 and 10, 
Figure 1). A go, no-go avoidance procedure 
with asymmetrical reinforcement was used, 
with the conditioned response being the 
flexion of a foreleg. Delay in reinforcement 
after onset of the positive stimulus was 7 
seconds; nonresponse to the negative stimulus 
had to be maintained for 10-15 seconds. The 
silent intertrial intervals approximated 2 
minutes in duration. Allen found that bi- 
lateral removal of the entire auditory areas 
resulted in amnesia, which training could not 
overcome. Postoperation performance demon- 
strated response to both positive and nega- 
tive stimuli, possibly a reflection of the asym- 
metrical reinforcement. Bilateral removal of 
only parts of the auditory area, however, had 
limited effects upon discrimination, and total 
unilateral lesions had little effect. In these 
latter cases, amnesia was evident, but train- 
ing reestablished the discriminal responses, 
occasionally with difficulty. 

Allen (1943) also used the same test to 
determine the effect of prefrontal lobectomies 
and found only temporary effects upon dis- 
crimination. However, as with the auditory 
lesions, postoperative errors that did occur 
were almost exclusively false positive re- 
sponses. This disappearance of the inhibitory 
response to negative stimuli is reported in 
many studies involving cortical lesions and 
has led to the suggestion that postablation 
nondiscrimination may reflect an inability to 
inhibit responses rather than an inability to 
detect differences (Thompson, 1960). One 
Way in which such a possibility might be 
tested would be the use of a two-alternative, 
forced-choice procedure. 

Meyer and Woolsey (1952) determined the 
effect of cortical lesions of various sizes and 
location upon frequency discrimination in the 
Cat. They used a modified single-response 
Procedure with tone pulses of 2 seconds and 
1-second interpulse intervals. Each trial con- 
sisted of the Presentation of 3-12 standard 
1000-Hz. pulses, followed by 1 pulse of 1100 
Hz. (Task 3, Figure 1). Shock accompanied 
by buzzer followed 1 second after the end of 
the higher-frequency pulse if the animal had 
not moved forward in the running cage. Thus, 
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the animal was required to detect the presen- 
tation of a single higher-frequency pulse and 
respond within 3 seconds of its onset to avoid 
the shock. It was estimated that the tones 
were about 25-30 decibels (db.) above the 
masking level of the experimental room’s 
background noise—not a particularly loud 
tone level. They found that extirpation of 
limited areas of the auditory cortex produced 
amnesia, but that the postoperative frequency 
thresholds after retraining were equal to the 
preoperative thresholds. However, if the 
lesion involved the combined destruction of 
SII and the auditory cortex bounded by the 
suprasylvian gyrus, frequency discrimination 
was lost and could not be relearned. It is also 
of importance to note that for two of. the 
animals that relearned after partial lesions; 
an increase of the interpulse interval to 
seconds affected the discrimination ability 
only in a limited manner; this suggests that 
short-term-memory loss may noť be a signifi- 
cant problem when lesions are not complete. 
Also of importance was the observation that 
the total lesions that resulted in inability n 
relearn the frequency-discrimination task did 
not result in an inability to relearn an 
intensity-discrimination task, F 
Butler, Diamond, and Neff (1957), using 
a single-response procedure (Task 1, Figure 
1) and asymmetrical shock reinforcement, 
also investigated frequency discrimination in 
the cat. Contrary to the findings of Allen 
(1945) and Meyer and Woolsey (1952), they 
found that removal of the areas AI, AIT, EP: 
and SII did not result in an inability to relearn 
the frequency-discrimination task, though am- 
nesia did occur. Goldberg and Neff (1961a) 
used the same procedure but with more exte? 
sive lesions in order to guard against the pos- 
sibility that the differences between Butlers 
results and those of Allen and of Meyer a? 
Woolsey may have reflected lesion differences- 
Their lesions included AI, AIT, Ep, SII, thé 
insular-temporal cortex, and the suprasylvia? 
gyrus of the cat, but even with this extensive 
lesion they found that frequency discrimina- 
tion could be relearned. Thus, procedural dif- 
ferences in testing probably were the source 
of the divergent results. What were they! 
First, in the Butler and Goldberg studies: 
intertrial intervals were filled with the new 
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tral tone pulses rather than silence, and the 
onset of the trial involved the presentation of 
a higher-frequency tone, which then alter- 
nated with the neutral frequency before shock 
was administered. In Allen's study, absolute 
recognition was required, while in the Meyer 
and Woolsey study, a single higher-frequency 
pulse was the positive stimulus; in both, the 
intertrial intervals were silent. While other 
differences existed, it would appear that these 
were the significant ones. Certainly, absolute 
recognition with 2-minute intertrial intervals 
in Allen’s study would appear to provide a 
more difficult task than Butler’s in which the 
different frequencies alternated at 2-second 
intervals. In the Meyer and Woolsey study, 
only a few neutral pulses preceded a single 
presentation of the higher-frequency positive 
stimulus. Consequently, the animal had a re- 
sponse time of only 3 seconds. Which of these 
differences (short train of neutral pulses or 
Short response time to a single positive tone) 
resulted in inability to relearn the frequency- 
discrimination task has not as yet been 
determined. 

Since testing procedures do play such an 
important role, Thompson (1959, 1960) also 
carried out a study with cats to investigate 
the effects of some procedural differences. He 
studied three different discrimination-testing 
procedures to determine ease of learning pre- 
operatively, and, for the two procedures that 
could be learned, he determined the effect of 
ablation of all cortex bounded by the supra- 
sylvian and rhinal sulci (AT, AII, Ep, SII, 
and I-T) upon performance. All procedures 
used a running cage and avoidance training 
with asymmetrical reinforcement. The first 
task was a go, no-go absolute frequency- 
recognition task (Task 9, Figure 1), a 2- 
second 1000-Hz. tone being the negative 
stimulus and a 2-second 1500-Hz. tone being 
the positive stimulus; intertrial intervals 
ranged 5-60 seconds. In case of failure to 
respond to the positive stimulus, à shock 
reinforcement was immediately given. The 
negative stimulus never Was followed by a 
shock. None of the cats learned the absolute 
procedure to criterion in 75 days of training 
(1,500 trials). The second discrimination was 
a go, no-go task and was termed an “alterna- 
tion procedure”; it involved the recognition 


of whether the auditory input was constant 
or alternated in frequency (Task 11, Figure 
1). The negative stimulus consisted of eight 
tones of the same frequency (630-millisecond 
pulses with 370-millisecond intervals), and the 
positive stimulus consisted of eight tones 
alternating between 1000 Hz. and 1500 Hz.; 
in this case, absolute recognition was not re- 
quired, though the animal had to recognize 
when tone trains consisted of constant fre- 
quency pulses and when the pulses alternated 
in frequency. This task was learned in 500- 
600 trials. The final procedure was a single- 
response detection procedure similar to that 
used by Butler et al. (1957) in which the 
intertrial intervals were filled with neutral 
1000-Hz. tones, and the trials consisted of 
pulses that alternated between 1000 and 1500 
Hz. This last procedure was learned very 
quickly, in about 100 trials. 

The final procedure was not only learned 
quickly but was also not affected significantly 
by cortical ablations, relearning requiring 
only about half as long as the original learn- 
ing. The “alternation procedure,” however, 
was affected, since the animals responded posi- 
tively to both negative and positive stimuli— 
the usual outcome with asymmetrical shock 
reinforcement. When countershock for false 
positive responses was introduced, the total 
number of responses was reduced, but dis- 
crimination did not improve. Also of great 
significance was Thompson’s finding that the 
cats unable to discriminate on the alternation 
procedure could respond correctly on the 
single-response detection task. Here, then, is 
clear evidence of the critical importance of 
the testing procedure, the easier single- 
response detection task suffering less than the 
go, no-go recognition task. 

It should be noted that in the single- 
response detection tasks so far discussed, the 
task involves the detection of a tone of a new 
frequency, which then alternates with the 
intertrial interval neutral pulses. However, 
this method of presentation can be reversed 
so that alternating tone pulses form the inter- 
trial interval neutral background, and the 
trial consists of a dropout of one of the fre- 
quencies (Task 6, Figure 1). Diamond et al. 
(1962) used this testing procedure with cats 
and found that for normal animals, the task 
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was much more difficult to learn than the 
task requiring the detection of a new fre- 
quency. Further, complete bilateral ablation 
of AI, AII, and Ep permanently eliminated 
the ability to detect the dropout; a cat with 
a portion of AI remaining, however, was able 
to relearn this task. All cats were able to 
learn the detection-of-a-new-frequency task. 
Quite clearly, test procedures are of critical 
importance. 

It is possible to eliminate the ability to 
detect a new frequency even in the easy 
single-response detection task, either by tran- 
secting the auditory system at the sub- 
cortical level or by extending the cortical 
lesions. Goldberg and Neff (1961b) carried 
out the first study and Thompson (1964), the 
second. Goldberg and Neff found that cats 
with complete sections of the brachium of 
the inferior colliculus, which resulted in elimi- 
nation of evoked auditory area cortical re- 
sponses, were unable to make frequency dis- 
criminations. On the basis of these findings 
and their earlier findings, which demonstrated 
that complete removal of the auditory cortex 
did not eliminate frequency discrimination, 
they suggested that in the absence of auditory 
cortex, frequency discrimination 
mediated at subcortical levels. 

Thompson (1964) created lesions in the cat 
that included AI, AII, Ep. SII, I-T, and, in 
addition, the three posterior nonspecific sen- 
Sory association areas, AMSA, PMSA, ALA 
(Figure 2), as well as all of the visual cortex. 
Using a Single-response | detection-of-new- 
frequency task, he found frequency discrimi- 
nation was no longer possible following these 
purely neocortical lesions and proposed that 
the critical areas, in addition to the auditory 
Cortex, are the three posterior association 
areas. He suggested that these association 
areas represent localized cortical projections 
of the ascending reticular formation and that 
their arousal via the reticular formation 
habituated to the repetitious neutral auditory 
put during intertrial intervals. Presentation 
of a new frequency, however, would arouse 
the association areas and serve as adequate 
cues for responding. Thus, for the single- 
response procedure, with its filled intertrial 
Intervals, the association areas could 


may be 


"take 
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over" from the auditory areas when only the 
detection of a new signal was required. 

It is difficult to reconcile the Goldberg and 
Neff and the Thompson explanations that 
tend to locate the functional areas for fre- 
quency discrimination in the absence of the 
auditory cortex in subcortical and cortical 
centers, respectively. However, a later study 
by Thompson and Smith (1967) lends sup- 
port to "Thompson's interpretation. In a 
study, normal cats and those with bilatera 
ablations of all association response er 
(middle suprasylvian gyrus, anterior ase 
gyrus, and pericruciate area) were Lnd 
pared on their abilities to learn an pon 
frequency-discrimination task and a sing E 
response detection-of-new-frequency task. Fo 
the absolute discrimination, S+ was 250 Hz. 
and S— was 2000 Hz.; for the single-respons? 
task, the neutral intertrial interval tones e 
2000 Hz., and a trial consisted of two oe 
of a 250-Hz. tone, alternating with the 20 al 
Hz. tone. For the absolute discrimination; n 
normals and three of four of ablated animan 
reached criterion. However, on the e 
response task, while all normal p ann 
reached criterion, none of the ablated icis 
did—the difficulty of the ablated Ue 
arising from the fact that they made too fe i 
responses to the trials, rather than too ape 
responses to mock trials. Such findings iem 
rather surprising in view of the usual dime 
tions that absolute discriminations septem 
a more difficult task than do relative bé 
criminations. It would appear that with de 
removal of the association response areas init 
introduction of a new frequency after ne 
repetitious presentation of the neutral s 
quency was not effective in arousing the hy us 
mal. On the other hand, the absolute discrin 3 
nation task which used silent intertrial on 
vals was not so hampered. Possibly 2s 
association, rather than the primary a 
areas, functions to arouse the animal W ean 
with continuing auditory input, a new pt 
occurs. If this is so, then possibly the eff cok 
of cutting the brachium of the inferior xd 
liculus may reflect impairment of projec" 


T d 
from the inferior colliculus to nonspecific jit n 
iems (Hoelle, 1968; Thompson & * gu 


1967), rather than the elimination of 
cortical auditory centers. 
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Jerison and Neff (1953) used monkeys in 
a single-response detection-of-new-irequency 
task similar to those used with cats. Bilateral 
ablation of the auditory areas did not inter- 
fere with the frequency discrimination, and 
it was suggested that the cortical projection 
areas of the monkey do not play a more 
important role than do those of the cat. 

A study by Evarts (1952) on absolute fre- 
quency recognition by monkeys involved a 
go, no-go procedure with both positive and 
negative reinforcements (Task 9, Figure 1). 
The response was the lifting of the cover of a 
foodwell, In this study, one of two frequen- 
cies (350 or 3500 Hz.) was presented during 
a trial, the lower frequency being the positive 
stimulus (for which food was presented when 
the foodwell cover was removed) and the 
higher frequency being the negative stimulus 
(for which shock was presented when the food- 
well cover was touched). Following extensive 
bilateral ablations of the auditory cortex, ani- 
mals were tested, and it was found that the 
ablation. did not interfere with the initial 
learning of the frequency-discrimination re- 
sponse and only disrupted the frequency- 
discrimination response learned preoperatively 
in a limited fashion. Here, while the task was 
termed “absolute recognition,” only 5 seconds 
elapsed between trials, so short-term-pitch 
memory and detection of differences, rather 
than absolute recognition, may have been 
involved. Further, the lack of extensive inter- 
ference suggests that the ablations, which were 
estimated to involve about 90% of the audi- 
tory cortex, left sufficient cortical tissue to 
accomplish the discrimination task. " 

Massopust, Barnes, and Bidura (1965) 
also used monkeys in studying absolute fre- 
quency recognition. They used a two-alterna- 
tive, forced-choice avoidance procedure in 
which the frequency presented determined 
Which of two bars to be pressed was correct; 
shock was given for incorrect responses, and 
correction required. Intertrial interval dura- 
tions varied randomly from 5 to 30 seconds. 
Animals were required to differentiate between 
either .5 kiloHertz (kHz.) and 1 kHz., 4 
Kz. and 4.5 kHz., or 7 kHz. and 8 kHz. After 
ablation of the superior temporal gyrus and 
the anterior-inferior surface of the postcentral 
gyrus to a uniform depth of 1 centimeter, 
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some animals were able to relearn the .5 
kHz.-1 kHz. discrimination to the preablation 
9066 criterion without great difficulty. How- 
ever, the 4 kHz—4.5 kHz. and 7 kHz.-8 kHz. 
discriminations could not be relearned to cri- 
terion. Whether this greater interference for 
the higher-frequency discriminations reflects 
differences in the coding of frequency infor- 
mation at high and low frequencies cannot be 
answered because relative frequency differ- 
ences were smaller for the higher than for 
the lower frequencies. From their findings, 
Massopust et al. (1965) suggested that abso- 
lute recognition suffers more than does the 
detection of frequency changes. Although 
they attempted to evaluate the extent of the 
ablations by recording evoked potentials from 
the auditory cortex just prior to sacrificing 
the animals, they reported that the findings 
were not definitive, finding evoked potentials 
for some animals showing discrimination, but 
not for others. This, however, might merely 
reflect the difficulty of completely examining 
the monkey’s auditory area in this manner. 
That is, failure to observe any evoked poten- 
tial may not prove conclusively that no func- 
tional cortical remnant remained, since the 
area is buried in the sylvian fissures and 
difficult to explore thoroughly. 

In a second study, Massopust, Barnes, 
Meder, and Meder (1966) determined the 
effect of frontal lobe lesions, separately, and 
in combination with auditory area lesions, 
using the same two-alternative, forced-choice 
procedure with monkeys. They found that 
partial bilateral removal of the auditory areas 
resulted in only short-lived discrimination 
deficits, while total bilateral ablation of the 
auditory area resulted in severe deficits, 
though escape responses remained. When par- 
tial auditory ablations were accompanied by 
total bilateral frontal lobe ablations, escape 
and avoidance responses were permanently 
lost, though from observations of the animals? 
alerting behavior at trial onsets, it appeared 
that hearing was not lost—only the ability 
to respond correctly. From this, it appears 
probable that auditory area ablations affect 
auditory functioning, rather than behavioral 
capabilities; frontal lobe ablations, however, 
appear to affect the animal’s cognitive behay- 
ior by interfering with the animal’s ability 
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to pair up the appropriate response to the 
auditory input. 

In a later study, Massopust, Wolin, Meder, 
and Frost (1967) used the same two-bar pro- 
cedure with monkeys to determine the fre- 
quency difference threshold (75% level per- 
formance) for a 500-Hz. standard tone. 
Preablation thresholds ranged below 5 Hz. In 
this study, partial auditory area lesions pro- 
duced an increase in threshold size, roughly 
related to the size of the lesion. Probably the 
lack of effect upon frequency discrimination 
reported in several of the earlier studies re- 
flects the fact that most discrimination tasks 
have used suprathreshold differences. 

Brown, Gedvilas, and Marco (1967) carried 
out a go, no-go recognition-of-new-frequency 
task with avoidance conditioning and sym- 
metrical reinforcement. In this task, the cat 
was not only required to detect the onset of 
a new frequency but also to determine which 
of two possible new frequencies had been 
presented. IIs were filled with neutral 1-kHz. 
pulses, while a trial consisted of a different 
frequency (either .8 or 1.2 kHz.) which 
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alternated with the neutral pulse (Task 8, 
Figure 1). One of the new frequencies was 
the positive stimulus, and the other the nega- 
tive stimulus. After ablation of areas AT, AII, 
Ep, SII, I, and T (with small remnants re- 
maining for some cats), five of the six animals 
were able to relearn this task. This indicates 
that the animals were thus capable of recog- 
nizing the direction of shift for the new fre- 
quency and were responding to more than a 
dimensionless change. 

Cornwell (1967) determined the effect of 
removing the I-T areas upon absolute fre- 
quency recognition with a go, no-go avoidance 
procedure. Intertrial interval durations ap- 
proximated 1 minute, and the frequencies to 
be recognized were .6 and 1 kHz. Ablated cats 
were unable to learn this task, though con- 
trol animals with the occipito-temporal are 
removed were able to learn it. 

Dewson (1964), with a three-bar proce: 
dure, was able to demonstrate recognition 0 
two frequencies, 1.1 and 3.7, kHz., in cats. 
Such recognition was unaffected by I-T lesions 
and lesions involving AI, AIT, and Ep. Intel 
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trial intervals were about 6 seconds and 
included food reinforcement. 

A series of Russian studies have been 
reported in which signal duration was ma- 
nipulated (Baru, 1966; Gershuni, Baru, & 
Karaseva, 1968; Khananashvili, 1966). Dogs 
were the experimental animal. These studies 
reported that auditory area ablations affected 
frequency recognition only for short signals 
(below 100 miliseconds). For longer signals, 
frequency discrimination was unaffected. It 
was suggested that for short signals the 
cortex provides short-term memory and thus 
improves discrimination. Removal of cortical 
tissue therefore interfered with short-signal 
discrimination; longer signals, presumably, do 
not require such “assistance.” 

Thus far, the studies have used open-field 
testing conditions in which both ears are 
exposed (though in some studies one ear had 
been surgically destroyed). In such studies, 


unilateral lesions were found ineffective, 
and no evidence of hemispheric dominance 
developed. 


Kaas, Axelrod, and Diamond (1967), how- 
ever, carried out a series of extremely signifi- 
cant studies in which tones were presented 
alternately to the cat's two ears by means of 
earphones. Figure 3 shows the original task 
in which it may be noted that the “ignoring” 
ear was presented with alternating frequencies 
that did not change during the warning 
signal. Tones to the “attending? ear were 
changed to provide the warning signal in the 
single-response avoidance task. In such a situ- 
ation the animal might attend only to the 
input to the “attending” ear or to the total 
binaural pattern, which also changed during 
the warning trial. In order to determine 
which set of cues the animals were using, 
Once the original task had been learned, two 
equivalence tests were introduced (Figure ay. 
In Test A, the tones of both ears were altered 
in order to produce a change in the binaural 
pattern—though the “attending” ear con- 
tinued to receive only low tones. In Test B. 
the original warning signal to the “attending” 
ear was used, while the tones to the 
"ignoring? ear were altered so that the bin- 
aural pattern remained similar to that of the 
intertrial interval neutral signal. Animals re- 
sponded to Test B 9596 of the time and to 


209 


Test A about 1% of the time. Thus it was 
clear that the animal was, in fact, attending 
only to the change in input to the "attending" 
ear. Later monaural equivalence tests verified 
this conclusion, since, when the tones pre- 
sented in the original task to the "attending" 
ear were presented to the "ignoring" ear, the 
animal failed to respond, though animals 
trained initially with the same task mon- 
aurally generalized immediately when the 
tones were presented to the other ear. Thus 
it appears that under the original binaural 
training conditions, the animals learned to 
ignore input to the "ignoring" ear. 

When unilateral lesions of the auditory 
cortex were later created, only those abla- 
tions in the hemisphere contralateral to the 
“attending” ear affected performance. Hemi- 
spheric dominance for this particular task, in 
which input to only one ear was attended to, 
was thus demonstrated, and selective atten- 
tion appears to be the special feature which 
made the habit susceptible to contralateral 
lesions. However, it is interesting to note 
that animals with contralateral lesions were 
able to relearn the original task and showed 
responses to the equivalence tests similar to 
the normal animals. The auditory cortex in 
the remaining hemisphere was then removed, 
and these animals were also able to relearn 
the original task and responded in a similar 
iashion to the equivalence tests. Thus, while 
the selectivity demonstrated by normal ani- 
mals appeared to depend upon the functional 
integrity of the contralateral cortex, it could 
be taken over by the remaining ipsilateral 
cortex, and when this was also removed, it 
could still be demonstrated. The authors sug- 
gested that the performance by both the 
normal and ablated animals depended upon 
the habituation—to both tones by the ig- 
noring ear and the low tone by the attend- 
ing ear. Thus, the animal needed only to await 
the appearance of the high tone in the at- 
tending ear to respond correctly, and this 
ability clearly does not require the presence 
of the auditory cortex. 

It is difficult to summarize concisely the 
studies on frequency discrimination since they 
vary in many ways. However, certain general 
conclusions do appear to be warranted: 
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1. The probability that an auditorv area 
lesion will result in interference with fre- 
quency discrimination depends on its size 
rather than on its location. Further, if rem- 
nants of the auditory areas of either hemi- 
sphere remain, then discrimination may be 
affected little, if at all, provided that a single- 
response detection-of-new-frequency task is 
used and suprathreshold differences are pre- 
sented. This latter fact increases the difficulty 
of evaluating results, since it is possible that 
unnoted remnants may sometimes remain. 
Thus, an error of interpretation likely to arise 
is from underestimation of the effect of a 
presumably complete lesion. 

2. Different types of discrimination task 
vary greatly in difficulty of initial acquisition 
and (correlated with this) the probability of 
suffering from lesions. Detection of new fre- 
quencies appears to be the easiest task to 
learn and most resistant to lesion effects (with 
the exception of ablations involving the asso- 
ciation areas). However, it is difficult to rank 
other types of discrimination tasks, such as 
detection of frequency dropout, detection and 
recognition of direction of frequency change, 
recognition of frequency-varying and fre- 
quency-constant tone trains, and absolute fre- 
quency discrimination. A sensible research 
strategy would appear to be the use of the 
more difficult discrimination tasks, since they 
may well be more sensitive to less-than- 
complete lesions. It is also remotely possible 
that the use of the more difficult discrimina- 
tion tasks might result in a delineation of 
functional differences of the various portions 
of the primary auditory area. Thus, the equi- 
potentiality noted in several studies might no 
longer be found to hold. 

3. Very few studies have concerned them- 
selves with frequency discrimination at dif- 
ferent standard frequencies. However, limited 
evidence does suggest that discriminations 


may suffer more for higher than for 


t lower 
frequencies. 


^. With particular reference to the results 
reported by Neff and his colleagues, and those 
reported by Thompson on the effects of de- 
stroying the primary auditory area and/or the 
nonspecific association response areas and the 
subcortical auditory pathways, it appears that 
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interference with discrimination may depend 
both upon the nature of the discrimination 
task and the location of the lesion. Thus, 
single-response detection-of-new-frequency 
tasks are unaffected by ablations of the pri- 
mary auditory area, though absolute discrimi- 
nation of frequency may suffer. On the other 
hand, damage to the nonspecific association 
response appears to interfere with the detec- 
tion of a new tone when it follows the repeti- 
tious presentation of a neutral tone, but not 
recognition of frequencies when intertrial 
intervals are silent. The two types of dis- 
crimination tasks apparently do not involve 
the same abilities, and these abilities appear 
to suffer differentially from lesions in different 
locations. 


Intensity Discrimination 


Relatively few intensity-discrimination 
studies have been carried out. Rosenzweg 
(1946) and Raab and Ades (1946) both "- 
a single-response asymmetrically reinforce 
avoidance procedure in which the cat's tas 
was the detection of an increase in the pe 
sity of tone pulses (similar to Task 2, Lian 
1). In Rosenzweig's study, complete or near 
complete bilateral ablations of the auditor? 
area resulted in amnesia, though the anima's 
were able to relearn the task, and the dis- 
crimination threshold was not changed. Raa 
and Ades carried out ablations at both the 
cortical and midbrain levels and also o 
that complete cortical ablations resulted in 
amnesia, though relearning and normal ue 
olds were possible. Bilateral destruction of a 
inferior or superior colliculi, however, po 
effect upon discrimination performance, UP i 
it was accompanied by cortical able 
in such cases, amnesia occurred, and large ea 
creases in the discrimination thresholds a 
sulted. From these findings, the authors v 
cluded that discriminations following poro 
ablations were mediated by the midbra? 
auditory centers and that discrimination 2 
the bulbar level occurred when cortical ay 
midbrain centers were destroyed. eget 
however, it appears that discriminations * 
mediated at the cortical level. nd 

Oesterreich and Neff (1961) also pe 
that extensive cortical ablations had lit 
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effect on intensity discrimination in the cat. 
While details are not reported, it would ap- 
pear that they used a single-response detec- 
tion-of-a-louder-tone procedure (Task 1, Fig- 
ure 1). They found that removal of AI, AII, 
Ep, SII, I-T, and the anterior and middle 
suprasylvian gyri had little effect on intensity- 
difference thresholds. 

Meyer and Woolsey (1952) investigated 
intensity-discrimination ability in the same 
animals in which frequency discrimination was 
eliminated by cortical ablations. They found 
that discrimination of an increase in intensity 
was still possible. 

The Russians also reported on intensity 
discrimination for brief signals (Gershuni;* 
Gershuni et al, 1968; Khananashvili, 1965, 
1966). As with frequency discrimination, they 
found that with short signals (less than 100 
milliseconds in one study, less than 20 milli- 
seconds in another) cortical ablations resulted 
in deterioration of intensity discrimination. 
Longer signals were not affected. They sug- 
gested that there are two different mechanisms 
for signal analysis, spatially separated. The 
first, with a short critical summation time, is 
cortically based and affected by cortical abla- 
tions. The process involves the establishment 
of traces of very short signals (memory) of 
sufficient length to permit discrimination. 
Longer signals, presumably, can be processed 
subcortically. Certain human patient data 
were cited to support this interpretation. 

To summarize, while relatively few inten- 
sity-discrimination studies have been reported, 
the data tend to suggest that cortical ablations 
may have less effect on intensity-discrimina- 
tion than on frequency-discrimination per- 
formance. One limitation on this observation 
should be noted. In the single-response studies 
that have been reported, trials have consisted 
of an increase in intensity. Whether the ani- 
mals would be equally capable of detecting a 
decrease in the signal intensity has not been 


determined. 


4G. V. Gershuni. Investigation of the neural 
physiological mechanisms in the process of external 
signal discrimination. Paper presented at the Tenth 
Congress of the I. P. Pavlov All-Union Physiological 
Society, Verevan, 1964. 


Localization and Lateralization 


Neff, Fisher, Diamond, and Yela (1956) 
determined the effect of cortical auditory area 
lesions upon the cat's ability to locate a 
buzzer located behind one of two (or three) 
food boxes. Since intertrial intervals were 
silent, the task involved absolute discrimina- 
tion, that is, recognition of the buzzer's loca- 
tion. While unilateral lesions produced little 
effect, bilateral ablation of AT, AIT, and Ep 
resulted in performance at near-chance level; 
increasing the angular displacement improved 
the performance, but it still remained quite 
poor. 

Strominger (1969a), using a procedure 
similar to Neff's, determined the effect of 
ablating bilaterally various subdivisions of the 
auditory cortex upon the threshold of localiza- 
tion (angular separation at which cats per- 
formed at a 75% correct level). He found 
that only ablation of AI affected localization. 
Tn a second study, Strominger (1969b) deter- 
mined the effect of large unilateral and bi- 
lateral lesions upon localization. He found 
that most animals with unilateral ablations 
of AI, AIT, Ep, I-T, and SII (and for some 
animals the further ablation of the anterior 
and middle portions of the suprasylvian 
gyrus) suffered moderate deficits in localiza- 
tion. When similar lesions were created in the 
second hemisphere, all animals! performances 
fell below threshold for a 90-degree angular 
separation, though performance did remain 
above chance. In a later study, Strominger 
and Oesterreich (1970) determined the effect 
of sectioning, unilaterally and bilaterally, the 
brachium of the interior colliculus. Large 
transient losses and some permanent impair- 
ment followed unilateral lesions and a com- 
plete loss of localization (up to angular sepa- 
ration of 135 degrees) followed bilateral 
transections. Thus, such lesions resulted in 
more severe deficits than found from complete 
bilateral cortical ablations. The authors sug- 
gested that in transecting the brachia adjacent 
pathways were also cut and that this may 
account for the more severe deficits. 

Masterton and Diamond (1964) studied 
the effect of cortical ablations upon the cat's 
ability to differentiate between clicks pre- 
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sented via earphones to the two ears. Using 
an asymmetrically reinforced single-response 
avoidance procedure in which the intertrial 
interval neutral stimuli were clicks presented 
to the left ear, while the trial consisted of 
clicks presented to the right ear (Task 2, 
Figure 1), they found that after learning this 
task, normal animals immediately transferred 
to a task in which the intertrial interval neu- 
tral stimuli consisted of binaural click pairs 
with the left-ear click preceding the right- 
ear click by .5 millisecond, and the positive 
stimuli consisted of click pairs with the right- 
ear click preceding. Thus it appeared that 
these click pairs, in which one click preceded 
the other, were perceptually equivalent to mon- 
aural clicks to the same ear as that of the 
leading click. However, cats with bilateral 
ablations of AI, AII, Ep, SII, and the I-T 
areas showed no transfer whatsoever, indi- 
cating that loss of the cortical areas elimi- 
nated the perceptual equivalence of the click 
pairs and the single click. However, these ani- 
mals were able through extensive training to 
learn the discriminal response to a better- 
than-chance level. The authors suggested that 
the learning involved the utilization of new 
cues (possibly not involving the percept of 
location). They also suggested that the in- 
ability of Neff's animals to learn the localiza- 
tion task after cortical ablations may have 
reflected the absolute nature of Neff's locali- 
zation task, as contrasted with the relative 
discrimination required in this study in which 
the animals were required merely to detect the 
Shift from the left-leading click pair to the 
right-leading click pair. 

This interpretation is supported by an 
Axelrod and Diamond study (1965) in which 
a g0, no-go procedure (Task 9, Figure 1) was 
used in place of the single-response procedure 
used by Masterton and Diamond (1964). In 
this study, intertrial intervals were silent, 
and the animals were required to determine 
Whether Clicks occurred in the left or in the 
right ear. Here, bilateral ablations resulted in 
amnesia and an inability to learn the discrimi- 
nation; though, as with the Masterton and 
Diamond study, the animals did learn to 
detect a shift from left-ear click to right-ear 
Click. Here, then, is very convincing evidence 
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that the loss of cortical tissue resulted in an 
inability to recognize in an absolute manner 
the location of the clicks—and possibly 
the loss of the percept of location as well. 
Whether, in fact, the ablated animals' ability 
to detect the shift in location involved detec- 
tion of some sort of undifferentiated change 
or whether it included an ability to differen- 
tiate between shifts from left to right and 
from right to left remains as yet unanswered. 
Thompson and Welker (1963) investigated 
localization skills by means of unconditioned 
head-orienting responses to short bursts of 
noise presented, in random order, through a 
loudspeaker located either on the left or en 
the right of the cage. Significantly poorer 
orienting responses occurred for animals with 
bilateral ablations of areas AI, ALI, Ep, SH, 
and I-T. The authors also interpreted their 
findings as indicating that the ablated animals, 
while performing more poorly, failed to habit- 
uate to the repeated noise presentations, 45 
did the normal animals. However, the method 
of scoring the orienting responses makes it 
impossible to determine whether the lack of 
habituation was “real” or reflected only 2 
constant chance level of performance. F 
Riss (1959) determined the ability of bi- 
laterally ablated cats (AI, AIT, Ep) to locate 
a piece of food dropped or rapped repeatedly 
on the cage floor. He found a marked decre- 
ment in performance, though location was 
possible if the sound were continued. It was 
suggested that the animal, by suitable move" 
ments of its head, was able to detect differ- 
ences in the loudness of the tone and coul 
respond appropriately to this cue. If true 
then the percept of location was not lost by 
these animals. This is not necessarily in dis- 
agreement with the Diamond studies me? 
tioned above, since the lesions in the Riss 
study were less extensive than those of the 
‘Diamond studies. ] 
These studies of localization and literaliza- 
tion are of great interest, since they involve 
the interpretation of such primary auditory 
cues as binaural intensity and time differences. 
It appears probable that such a task is more 
likely to involve cortical activity than merely 
the detection (or recognition) of inherently 
different auditory signals having little s18- 
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nificance beyond the fact that they do 
differ. If this is true, the greater effects of 
cortical ablations upon localization than upon 
detection of loudness or pitch changes is 
understandable. 


Temporal Discriminations 


Rates of stimulus presentation and signal 
duration are two types of temporal discrimi- 
nation that have been studied. Allen (1945) 
determined the effect of complete and incom- 
plete bilateral lesions upon the ability of the 
dog to differentiate rates of tapping a bell 
(1 per second versus 3 per second rates). His 
test involved absolute discrimination with a 
80, no-go asymmetrically reinforced avoid- 
ance procedure. Incomplete lesions, if exten- 
sive enough, interfered with retention of the 
task, but the task could be relearned. Total 
bilateral lesions, however, resulted in a 
nonreversible loss. As might be expected, the 
nondiscriminal behavior was reflected by posi- 
tive responses to both positive and negative 
stimuli, 

Scharlock et al. (1965) used a single- 
response procedure with filled intertrial inter- 
val (Task 7, Figure 1) in which the cat's 
task was to detect the change from 4-second 
pulses to 1-second pulses. Animals in which 
the lesions were confined to the projection 
areas of the medial geniculate were able to 
relearn the task. However, animals in which 
the lesions included the total medial genicu- 
late projection areas and also extended into 
either SII or the suprasylvian gyrus were 
unable to relearn the task. Since even with 
these extensive lesions animals trained with 
à similar procedure to discriminate frequency 
difference could still do so, it would appear 
that duration discrimination suffers more 
from cortical ablations than does frequency 
discrimination. 

French (1942) used rats in a 80; no-go 
asymmetrically shock-reinforced study of ab- 
solute click-rate discrimination. He found 
that bilateral lesions involving most of the 
auditory areas resulted in deficits in the abil- 
ity to detect small differences in rate, though 
discriminations were still possible if click-rate 
differences were large enough. 


Stepien, Cordeau, and Rasmussen used 
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African green monkeys in studying the effect 
of ablations upon click-rate recognition. Food 
was presented for correct positive responses. 
Positive click rate was 7 per second, and 
negative click rate was 17 per second. They 
found that an animal, in which the neocortex 
of the first and second superior temporal gyri 
anterior to the primary auditory areas was 
removed, was able to learn this discrimination 
postoperatively with little difficulty. Of 
course, more than temporal information was 
probably involved in such discriminations. 
The power spectra of the two click rates 
undoubtedly differed. 

Symmes (1966) used a go, no-go procedure 
and required monkeys to recognize intermit- 
tent (10 interruptions per second, 80% duty 
cycle) and continuous noise. Symmetrical 
food reinforcement was used, and intertrial 
intervals were 15 seconds. Complete and near- 
complete ablations of the focal-zone auditory 
cortex resulted in loss of the discrimination 
and inability to relearn the task. Where sig- 
nificant portions of the auditory area re- 
mained, however, the task could be relearned. 

Khananashvili (1965, 1966) used an asym- 
metrical food-reinforced procedure to condi- 
tion dogs to recognize rise-time differences of 
2-second noise pulses. S+ had a rise time of 
30 milliseconds, and S— had various longer 
rise times. He found that bilateral removal 
of the ectosylvian and suprasylvian gyri in 
one dog increased the rise-time discrimination 
threshold so that S— with a 130-millisecond 
rise time could not be discriminated from the 
S+ with a 30-millisecond rise time, S— with 
rise times of 250 milliseconds and longer 
could, however, be recognized. Since he found 
that frequency and intensity discrimination of 
very short signals was also affected by such 
lesions (see above), he suggested that the 
cortex's contribution to the discrimination of 
very brief acoustical phenomena depends on 
its ability to prolong and maintain the 
processes evoked by the brief inputs. 

One cannot draw any sweeping conclusions 
from these few studies. However, it does ap- 
pear that detection of changes in duration 
can be relearned after total bilateral auditory 
ablations. Absolute discrimination of click 
rates is also possible after extensive lesions, 
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though total auditory lesions may result in 
an inability to relearn the differences (1 per 
Second versus 3 per second rates). It also 
appears quite possible that temporal informa- 
tion based on very brief changes in input may 
also suffer from cortical ablations. 


Discrimination of Complex Spectral 
Differences 


Some discrimination studies have involved 
signals with multiple differences, In such 
Studies, it is difficult to Specify which of the 
differences are serving as cues for discrimina- 
tion, and interpretation must be tentative. 

Cornwell (1967), as a part of a larger 
study, determined the effect of bilateral I-T 
lesions in cats on their ability to learn, post- 
operatively, to discriminate between the hiss 
of an air jet and a metronome click (3.5 per 
second rate). He used a go, no-go symmetri- 
cally reinforced avoidance procedure in which 
failure to respond to the positive stimulus 
resulted in a shock, and response (in the 
shuttlebox) to negative stimuli was punished 
by pushing the shuttlebox partition in front 
of the jumping cat’s head. He found that this 
complex discrimination was learned as quickly 
by the ablated animals as by normal contro] 
animals, 

Stepien et al. 
keys, determined 
moval of the first 
poral gyri anterior 


Chorazyna and Stepien (1961) 
metrical shock-reinforced 
ation used dogs to determine the effect of bi- 
lateral removal of the an 


sylvian gyri, They reported that the animals 
were able to di i 
nals; 


Symmes (1966) 
with bilateral ablatio 
tory Cortex, 


reported that 
ns of the focal-zone audi- 
While unable to discriminate 
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chopped from continuous noise, had little dif- 
ficulty in recognizing a 1-kHz. pure tone and 
noise (go, no-go procedure with symmetrical 
food reinforcement). Details concerning the 
intensities of the two signals were not given. 

Dewson (1964) used a three-bar procedure 
in which, at the start of a trial, the cat was 
required to recognize two different WT 
cies, or two vowel sounds, [u| and li]. I 
the correct bar was pressed, the : signal 
changed to the vowel sound [a] that signaled 
the arming of the third bar which, r^ 
pressed, produced a food reward. Cats with 
bilateral I-T ablations were unable to recog- 
nize the vowel sounds when presented at equal 
intensities, though they could recognize. dif- 
ferent frequencies and different num 
Animals with ablations of the primary Rud 
tory area, however, were unaffected in gri 
nizing any of the signals. In addition, all ui 
mals responded immediately when the we 
changed to the vowel [a]. Here is REUIHUnS 
evidence that detection of a change in i A 
auditory input is less difficult than recognitio 
of auditory input. 

Dewson, Pribram, and Lynch (1969) used 
another three-bar test procedure with m 
keys. With this procedure, the animal Res 
required to initiate the trial by pressing 0 ir 
lever. Immediately, one of a particular yor 
of stimuli was presented, and by pressing L3 
correct one of the two remaining bars ior 
reinforcement was obtained. A 6-second p 
out followed reinforcement, so at least ns 
interval intervened between signal peset 
tions. Bilateral ablations within the Lg 
ries of the primary auditory projection 2 f 
Were created in some animals, and for ot ‘al 
animals the lesion was buried within. x 
superior temporal sulcus (i.e., in the au 
association area). For animals with lesions ke 
the primary auditory area, recognition of mi 
vowel sounds, [i] and [u], was no longer po 
sible and could not be relearned, Lupa 
recognition of noise or an 800-Hz. tone i 
virtually unaffected. Animals with lesions ps 
the auditory association area suffered ci 
but reversible losses of the vowel recognition, 
While recognition of the 800-Hz. tone p 
noise was only slightly affected. Clearly; p5 
vowel patterns presented more difficult recog 


q 


— 


CORTEX AND AUDITORY DISCRIMINATION 215 


nition problems than did the tone-noise recog- 
nition, In both of Dewson’s studies, he found 
that following acquisition of vowel recogni- 
tion ability, shifting from a male to a female 
voice created little difficulty. Apparently the 
animals learned to recognize the pattern of 
the speech sounds. 

With the limited number of studies and the 
differing types of lesions and experimental 
animals, generalizations are risky. Further, 
since the signals that were used varied in 
complex ways, one cannot identify the nature 
of the cue(s) used in making the discrimina- 
tions. However, it appears probable that the 
greater the number of dimensions along which 
the signals differ, the greater will be the 
number of discriminal cues available, at least 
for detection tasks. If complex discrimina- 
tions are to be studied systematically and 
with the expectation of evaluating the relative 
importance of the various cues, it will be 
necessary to develop signal-generating equip- 
ment that will allow the independent control 
of the differences along each of the dimen- 
sions. With such equipment one could then 
provide the complex cues in many different 
combinations in order to determine the extent 
to which they interact with each other in 
providing signals discriminably different. 


Discrimination of Changes in Temporal 
Patterning 

In the discrimination tasks so far consid- 
ered, certain distinctions in the general nature 
of the discrimination can be made. Some dis- 
criminations have involved signals that, in one 
way or another, set up different patterns of 
neural responding; as a result, the differences 
to be discriminated reside in the immediate 
sensations evoked. Examples are pitch dis- 
crimination, loudness discriminations, and 
discriminations involving complex spectral 
differences. Other discrimination tasks, how- 
ever, do not involve different-sounding signals, 
but require that the discriminator note dif- 
ferences in the duration or rate with which 
the same sound is presented. Thus, the only 
auditory requirement is that the animal be 
able to hear. Since temporal cues transcend 
specific sensory inputs, these latter dis- 
criminations probably represent higher-order 


discriminations than do those involving spec- 
trally different signals. 

A number of very interesting studies have 
been carried out in which both temporal and 
spectral cues have been involved in the dis- 
crimination task. While these tasks have been 
labeled differently by various investigators, 
the term “pattern discrimination” used by 
Neff et al. is an accurate description of the 
discrimination. Tasks 4, 5, and 6 in Figure 1 
are examples of pattern discrimination. It will 
be seen that the discriminations involve the 
detection of a change in the temporal order 
of tone presentation. Task 4 involves a change 
in the tone triads with the ratio of high-to- 
low tones reversing, the positive-trial triads 
beginning and ending with high tones rather 
than the low tones with which neutral triads 
begin and end. Thus, in this task, there are 
two possible cues. In Task 5, the ratio of 
high-to-low tones does not change between 
neutral and positive triads; rather, the within- 
triad variability is altered. Task 6 involves 
a change in the ratio of high-to-low tones, 
though the beginning and ending tone of each 
triad remains unchanged. While each of these 
pattern-discrimination tasks requires the de- 
tection of changes in the temporal ordering of 
different tones, they differ in the particular 
cues that are provided. Further, in Tasks 4 
and 5 the same tones appear in both neutral 
and positive triads, and in this sense Task 12 
is similar to them. In addition, in Task 12 the 
bell and buzzer may appear as either the first 
or the second component of the stimulus dyad 
in both positive and negative stimuli. In this 
latter task, then, it is not a change in the 
ongoing temporal ordering that is to be de- 
tected; rather, it is recognition of whether 
or not there is a difference in the spectral 
characteristics of the dyadic components, 

The pattern-discrimination studies dis- 
cussed below provide extremely interesting 
data since, generally, pattern discrimination is 
more susceptible to disruption than are dis- 
criminations of frequency, intensity, duration, 
and complex spectral differences. They also 
are more difficult to learn initially. 

Neff and his associates have carried out a 
series of studies on pattern discrimination, 
Diamond and Neff (1953) and Jerison and 
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Neff (1953) determined the effect of cortical 
ablations in the cat and monkey upon their 
ability to detect changes in the order oí 
presentation of tones differing in frequency. 
The tasks were similar to Task 4 (Figure 1), 
the two frequencies being .8 kHz. and 1 kHz. 
They found that when AI, AII, and Ep in the 
cat and the primary auditory area of the 
monkey were bilaterally removed, the pattern 
discrimination was lost and could not be re- 
learned. (If small portions of AI or AIT re- 
mained, pattern discrimination was possible.) 
However, a simple Task 1 frequency-discrimi- 
nation task involving the same frequencies 
could be relearned. From this, it is evident 
that the inability to respond correctly to the 
pattern task did not reflect an inability to 
detect frequency changes. 

Diamond and Neff (1957), using Tasks 1, 
4, and 5, varied the size of the lesion in the 
cat. When only AI was removed, pattern- 
discrimination responses were unaffected; 
when AI, Ep, and most of AII was removed, 
the discrimination was lost but could be re- 
learned. When all of AI, AII, and Ep was 
removed, the discrimination was lost perma- 
nently (see also Neff & Diamond, 1958). 
Task 1 frequency discrimination could, how- 
ever, be relearned. They did not report any 
differences in performance on the two types 
of pattern discrimination (Tasks 4 and 5). 

Diamond et al. (1962) used a “dropout” 
procedure (Task 6) in which the frequency- 
alternating neutral background was changed 
to a positive signal by dropping out the 
higher tone. They found that with this task, 
the initial preablation training was much 
slower than when the neutral and positive 
Stimuli were reversed (Le, Task 1). While 
these animals could relearn the simple detec- 
tion-of-new-frequency discrimination (Task 1) 
after complete ablation of AI, AII, and Ep, 
they could not relearn the dropout pattern 
discrimination, However, if a small portion 
of AT remained, the dropout discrimination 
Was relearned, 

From this series of pattern-discrimination 
and simple frequency- and intensity-discrimi- 
nation studies Neff developed a neural 
model (Diamond et al, 1962; Neff, 1960, 
1961). He noted that auditory tasks which 


can be relearned after complete auditory area 
ablations involve the presentation of a new 
tone. In the case of absolute threshold deter- 
minations, the procedure involves the presen- 
tation of tones after silent intertrial intervals 
In the case of frequency discrimination, the 
procedure involves the detection of the pres- 
entation of a new frequency. In the case of 
intensity discrimination, the procedure in- 
volves the detection of the presentation of a 
more intense tone. Since a new tone was al- 
ways involved, he suggested that new neural 
units were excited and/or a larger neural 
response resulted,® which, since pattern dis- 
criminations involving only the rearranging of 
the high- and low-frequency tones were im- 
possible for postablation animals, must c 
as the significant cues for such animals. Pu 
in another fashion, it was suggested that an 
animal's successful postablation performance 
on Task 1 or Task 2 frequency- and intensity- 
discrimination tasks depends upon the increase 
in neural activity. 

Two implications grow from this theory. 
First, if a larger neural response is the dis- 
criminable cue, then the animal must we 
habituated to the repeated intertrial neutral 
stimuli; a corollary of this conclusion is that 
since habituation probably does not deraan 
instantly, postablation performance on Task ' 
and Task 2 frequency- and intensity-discrin 
nation tasks should improve as the intertria 
intervals increase and allow for complete ha- 
bituation. (Neff has suggested that the aa 
uation develops in about 1 minute, ems 
this question has not been systematic 
investigated.) Second, if the ablated ori 
successful response depends exclusively on the 
detection of a larger evoked neural response; 
then the animal should be unable to differen- 
tiate between different new tones. Thus, the 
ablated animal should be unable to distinguish 
between different frequencies that might be 
presented as positive stimuli and should also 


5 Neff does not differentiate between these aem 
tives. Of course, it is quite possible that DES d 
activated neural units will also fire more rapidly ri 
thus produce a larger total neural response, and tha 
both cues are thus available. To simplify the disces 
sion, we refer to the increased neutral activity, 
though it probably involves new neural units. 
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be unable to differentiate changes in fre- 
quency from changes in intensity. Operation- 
ally, to investigate this question, one might 
use a Task 8 (Figure 1) test with its con- 
stant neutral tone pulses and fwo types of 
trials, each involving either different fre- 
quency tones or changes in frequency and 
intensity. If the ablated animal was able to 
respond differentially to the two types of 
trials, it would appear that the assumption 
that the ablated animal depends exclusively 
on the increased neural response as the cue 
to respond should be seriously questioned. 

One study has been reported that used this 
Task 8 procedure. Brown et al. (1967) used 
symmetrical shock reinforcement to train the 
cats to respond when the trial frequency was 
above the neutral tones and to inhibit re- 
sponses when the trial frequency was below 
the neutral tones. After ablations intended 
to remove AI, AII, Ep, SII, I, and T, five 
of the six animals could relearn this task in 
about the same number of trials required 
preoperatively. However, it was reported that 
ablations were not invariably complete, and 
it is difficult to determine the extent to which 
incomplete lesions may have contributed to 
the relearning. At any rate, if completely 
ablated animals can, in fact, differentiate be- 
tween new tones, then the neural model will 
have to be extended. 

Another prediction generated by the model 
is that when the positive signal consists of a 
component of the neutral background stimu- 
lus the animal should be unable to detect its 
Onset—since no new fibers and/or larger 
neural response should result. T rahiotis and 
Elliott (1970) reported on a study of this 
Sort with cats, in which the neutral stimulus 
COnsisted of repeated tone-noise dyads (the 
tone falling outside the noise band), and the 
Positive stimulus consisted of dyads in which 
a tone falling within the neutral noise band 
was substituted for the noise. Four tasks were 
used, as indicated in Table 1. In all cases, the 
repetitive intertrial neutral stimuli consisted 
of dyads whose components consisted of a 
500-Hz. tone which alternated with a high- 
pass band of noise centered at 4000 Hz. It 
can be noted that Neff’s model would pre- 
dict that Tasks 1 and 3 should be insoluble 
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TABLE 1 


PATTERN-CHANGE-DETECTION TASK 


Repetitive intertrial Positive dyads 


Task | neutral stimulus dyads | (warning signal) 
1 500 Hz. HPB 500 Hz. 4000 Hz. 
2 500 Hz. HPB LPB HPB 
3 500 Hz. HPB 4000 Hz. HPB 
4 500 Hz. HPB 500Hz. LPB 
Note.—High-pass band of noise = HPB; Low-pass band of 
noise = LPB. 


for animals with central lesions, since in these 
tasks both positive dyads contained as the 
new component a 4000-Hz. tone that fell 
within the high-pass band of noise—and thus 
presumably did not serve to arouse any new 
fibers. However, after removal of AI, AIT, Ep, 
and I-T these animals relearned all the tasks 
quite readily. They did not respond to other 
new dyads to which they had not been pre- 
viously trained to respond, but which did 
differ from the neutral dyads. From this, it 
would appear that the animals’ cue to respond 
was not merely the presentation of stimuli 
that differed from the neutral stimuli. 

Neither of these studies, of course, provide 
data in disagreement with Neff's observation 
that postablation tasks involving filled inter- 
trial intervals which can be performed involve 
the presentation of a new signal. However, 
the explanation that the effectiveness of such 
new signals is based merely on the excitation 
of new fibers or the evocation of larger neural 
responses must be questioned. Certainly, it 
appears unlikely that the cue(s) used by 
ablated animals was based exclusively on 
either of the cues specified by the model. 
While the habituation hypothesis certainly 
need not be discarded, it appears probable 
that it is only a partial explanation. A new 
signal may be necessary—possibly for its 
attention-arousing capabilities—but it must 
also possess additional unique cues that en- 
able the animals to differentiate among vari- 
ous new signals and differentiate between 
the neutral noise band and a component of 
that band. 

Another series of studies of somewhat simi- 
lar nature have been carried out and involve 
a delayed matching-from-sample procedure. 
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The rationale underlying this procedure (Task 
12, Figure 1) has been spelled out by Konor- 
ski (1959). He pointed out that the animal 
was required to wait until the second of the 
pair of stimuli was presented in order to com- 
pare the two tones and make the correct 
response. (In this sense, the "pattern" had 
to be recognized.) Since an interval existed 
between the two stimuli of the compound, 
Konorski suggested that recent memory was 
involved. 

Chorazyna and Stepien (1961) used this 
procedure with dogs with a go, no-go, asym- 
metrical food-reinforced procedure. They pre- 
sented different compound stimuli to various 
dogs, namely, compounds composed of two 
pure tones either of similar (S+) or of dif- 
ferent (S—) frequency; two pure tones of 
similar (S+) or of different (S—) intensity; 
or bell and repetitive clicks. Signals were 2 
seconds long with a 3-second interval. In all 
cases, the positive stimulus compound was 
made up of two identical stimuli, and the 
negative compound was made up of two dif- 
ferent stimuli. Following bilateral ablation of 
the anterior and posterior sylvian gyri, the 
discrimination response was lost, with all dogs 
responding to the negative compound stimuli 
as well as the positive compounds—not sur- 
prising since asymmetrical reinforcement was 
used. With postablation training, the dogs 
improved in their discrimination of the bell 
click and the weak-loud compounds, though 
performance apparently did not equal pre- 
ablation performance. However, the dogs pre- 
sented with the compounds involving different 
pitches were unable to relearn the task. 

Stepien et al. (1960) tested African green 
monkeys with the same general procedure, the 
stimuli making up the compound being click 
rates. Thus, when the compound was made 
up of two stimuli of the same click rate it 
was positive, while a compound made up of 
stimuli of two different click rates was nega- 
tive. Stimuli were 2 seconds long with a 
lesecond interstimulus interval. Reinforce- 
ment for responses to the positive stimuli 
was food, and for responses to the negative 
stimuli was shock. Following initial training, 
the ablation consisting of removal of the neo- 
Cortex of the first and second temporal lobes 
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anterior to the primary auditory area was 
made. The ablations were bilateral but per- 
formed serially, with the animals tested be- 
tween the two unilateral ablations. No effects 
from the first ablation were evident. However, 
iollowing the second operation, the animals 
were no longer able to discriminate between the 
positive and negative compound stimuli and 
could not relearn the task. When the 1-second 
interstimulus interval was eliminated, per- 
formance improved but never reached pre- 
ablation levels. These animals could, however, 
make simple Task 10 recognitions involving 
bell and buzzer, or high and low click rates; 
in which one of the stimuli was always post 
tive and the other always negative. Further; 
they learned to inhibit a response to a positive 
stimulus whenever it was preceded by an 
inhibitory stimulus, the inhibition remaining 
effective even when there was an 8-10-second 
interval separating the inhibitory and positive 
stimuli. Thus, the animals were able tO 
recognize stimuli, though they were unable to 
discriminate between compounds of them. 
When these animals were again tested after 
a period of 2 years, with a .2-second inter- 
stimulus interval (Cordeau & Mahut, 1964), 
the deficit was found to still exist in two of the 
three monke however, the third monkey 1n 
which the lesion was limited to the superior 
temporal convolution did discriminate ade- 


n » enemas 
quately, even when the interstimulus inter 
ma 


was increased to 5 seconds, Since this an! 
did about as well with a 5-second delay, 
Cordeau and Mahut suggested that the aeie 
in discriminating between stimulus compounc 5 
might not reflect a recent memory deficit, 45 
had been suggested by Stepien et al. (1960). 

Cornwell (1967) studying cats also use! 
compound stimuli made up of a bell anc 
buzzer, the compound being positive when 
both stimuli were the same, and negative 
when the two stimuli differed. Reinforcement 
was symmetrical, with shock reinforcemen 
used for positive compounds and closing 9 
the shuttlebox partition in front of the ber 
of the jumping cat for responses to negativ 
compounds. Following bilateral ablation of the 
I-T areas, the animals functioned at p 
chance level throughout the 1000-trial ars 
ablation testing procedure. It is unfortunat 
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that the postablation testing was terminated 
so soon, since the initial training required 
from 3,000 to 7,400 trials to achieve criterion; 
it would appear necessary in studies of this 
Sort to use more postablation learning trials 
than were required initially before one should 
conclude that learning the discrimination is 
not possible. When a simple recognition test 
using a procedure similar to Task 10 and 
requiring discrimination between an air-jet 
hiss and a series of clicks was presented, the 
animals learned as quickly as control animals; 
when the task required the recognition of 
pure tones of different frequencies, however, 
the animals were unable to learn the task. 
The pattern studies of Neff and his co- 
Workers, and the compound-stimuli studies of 
Konorski and his associates, are in some 
aspects quite similar. The basic task confront- 
ing the animals is to discriminate between 
different orders of presentation of two dif- 
ferent stimuli. The Neff studies involve filled 
intertrial intervals that provide a background 
in which the different tones are continuously 
Presented in the same pattern and the animal 
is required to detect when the pattern order 
is altered; the ablated animal’s inability to 
carry out this task could result from habitua- 
tion, as Neff et al. (1956) have suggested, 
Or it could reflect a failure of recent memory 


as Konorski (1959) and his co-workers 
(Stepien et al, 1960) have suggested. 
As already indicated above, when the 


implications of a strict interpretation of the 
habituation hypothesis were tested, they were 
not supported—and while habituation may be 
a partial explanation, it does not appear to 
be completely adequate. Further, the habitu- 
ation hypothesis incorporated in Neff's neural 
model cannot explain the inability of the 
ablated animals to discriminate between com- 
bound stimuli used by Konorski and his 
associates. On the other hand, Konorski's hy- 
Pothesis concerning recent memory deficits 
must be questioned on the basis of findings 
that ablated animals presented with Task 1 
discriminations, in which interstimulus inter- 
vals were 1 second in length, could success- 
fully make the required discrimination (e.g. 
Neff & Diamond, 1958). Further, since ani- 
mals unable to discriminate compound stim- 
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uli could recognize different signals in a 
Task 10 recognition situation (which must 
involve long- term memory), it appears un- 
likely that a memory deficit was the source 
of the difficulty. 

Because of major procedural differences, 
one need not necessarily assume that the dif- 
ficulties evident in Neff's pattern-discrimina- 
tion tasks and in Konorski’s compound- 
stimulus task are reflections of similar deficits. 
Nefi’s pattern tasks involve filled intertrial 
intervals and require that the animal detect 
changes. Thus, while habituation—or lack of 
“attentiveness’”—may adequately explain his 
findings that ablated animals cannot detect 
patterning changes, it appears unlikely, on 
the basis of the Brown et al. (1967) study 
and the Trahiotis and Elliott (1970) study, 
that in tasks which the animals can perform 
correctly, the ability depends exclusively on 
the undifferentiated detection of new or larger 
neural events. Rather, with filled intertrial 
intervals, ablated animals may require a test 
procedure which can overcome attentional 
deficits so that discrimination can be made. 
On the other hand, the Konorski (1959) 
study involves silent intertrial intervals so 
that inattention to trials is not a problem. 
With such a procedure, however, each com- 
ponent of the stimulus compound is pre- 
sented only once, and one might conclude 
that ablated animals lacked the ability either 
to develop an immediate appreciation of the 
sensation or the ability to compare the two 
relatively brief stimuli, each of which is pre- 
sented only one time. This interpretation is 
not acceptable, however, when one recalls 
Thompson's (1959, 1960) studies in which, 
on a go, no-go task with silent intertrial 
intervals, ablated animals were unable to dis- 
criminate between trains of pulses which 
alternated between two frequencies (S+) 
and trains of pulses of a single frequency 
identical to one of the S+ frequencies (S—). 
In many respects, Thompson’s task was simi- 
lar to the Konorski task: Positive and nega- 
tive trials included identical components; 
intertrial intervals were silent; and the ani- 
mal was forced to wait until at least two tone 
pulses were presented to determine whether 
the trial was positive or negative. It will also 
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be recalled that in Thompson's study the 
animals could respond correctly to positive 
trials made up of alternating frequencies when 
the intertrial intervals were filled with fixed- 
frequency tone pulses identical with one of 
the S+ alternating frequencies—that is, 
onset of S+ resulted in the presentation of 
a new frequency. Possibly, ablated animals 
need the frame of reference provided by the 
unchanging intertrial interval pulses that are 
identical to one of the components of the S+ 
signal in order to differentiate between S+ 
and S— when S+ and S— consist in part of 
common components. On the other hand, 
when S+ and S— are completely different 
(e.g., bell and buzzer or relatively large fre- 
quency differences) they can be recognized in 
an absolute manner. Thus, Neff’s (1960) 
observation that nondiscriminable tasks for 
the ablated animal involve the presentation 
of trials in which no new signals appear is 
correct, but one should add that ablated ani- 
mals may also be unable to discriminate be- 
tween multiple new signals when they appear 
as parts of both S+ and S—, or when S+ 
and S— consist of overlapping components. 
It should be recognized, of course, that these 
two conditions which lead to nondiscrimi- 
nating performance may reflect quite different 
deficits. 
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COMMENT ON COMPONENT-RANDOMIZATION TESTS 


EDWARD F. ALF! Axp NORMAN M. ABRAHAMS ? 


Naval Personnel and Training Research Laboratory, San Diego 


Examination of the discrete randomization ¢ and F tests reveals that in general 
they should not be regarded as the appropriate exact tests, which are only 
approximated by their parametric counterparts. In very small samples, the 
randomization tests will fail to reveal differences easily detected by the ap- 


propriate parametric test. 


The method of component randomization 
was originally proposed by Fisher (1926). He 
discussed the method in his Design of Experi- 
ments (1960, pp. 44—49) with reservations 
that are not reflected in the writings of 
modern enthusiasts: 


In recent years tests using the physical act of 
randomization to supply (on the Null Hypothesis) a 
frequency distribution, have been largely advocated 
under the name of “Non-parametric” tests. Some- 
What extravagant claims have often been made on 
their behalf. The example of this Section, published 
in 1935, was by many years the first of its class. 
The reader will realize that it was jn no sense put 
forward to supersede the common and expeditious 
tests based on the Gaussian theory of errors [p. 481. 


"This contrasts markedly with the sentiments 
of Some contemporary writers, who give the 
impression that parametric ¢ and P tests 
merely provide approximations to the correct 
component-randomization tests and that only 
the mathematical cumbersomeness of these 
latter tests prevents their general use. For ex- 
ample, McHugh (1963) stated: 


The relationship of the continuous F distribution to 
the exact discrete randomization F distribution 4s 
Analogous to the case, Well known to psychologists, of 
the relationship of the continuous normal distribution 
to the exact binomial distribution—the latter. rela- 
tionship being familiarly referred to as the “normal 
approximation to the binomial [pp. 350-3511.” 

The purpose of the present comment is to 
Point out that the analogy above is quite false. 


In large samples, the power of a component- 


1 Also at San Diego State College. 
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randomization test often approaches that of 
its corresponding parametric test (Hoeffding, 
1952). However, in small samples the com- 
ponent-randomization test and its parametric 
counterpart yield similar results primarily 
when the null hypothesis is true, and the two 
tests will yield more and more dissimilar re- 
sults as the experimental effect becomes more 
and more pronounced. 

To understand how very different the two 
types of test can be, consider a simple limit- 
ing case where the degrees of freedom are 
minimal and the effect under investigation is 
very great. Table 1 presents the weights at 
birth and at age 16 of two boys drawn from 
a population of interest. We wish to know 
whether there has been a significant gain in 
weight. 

If we perform a conventional / test on 
these differences in Table 1, we find ¢ = 38.4, 
approximately. The differences are thus sig- 
nificant at the .01 level, using a one-tailed 
test of significance. 

Now, let us perform a component-ran- 
domization £ test on these same data. Briefly, 
the component-randomization test asks the 
question, “If, for each individual, the two 
weights were distributed randomly to the 
‘age 16! and ‘at birth’ condition, what is the 
probability of a mean difference equal to or 


TABLE 1 


Weicut or Two Boys AT BIRTH AND AT AGE 16 


Weight at 


ue Weight at eight Difference 
t [ de 6.1 159.6 
5 | 151.5 


160.3 8.8 | 


Note.—Mean difference — 155.55, 
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TABLE 2 


EQUALLY LIKELY MEAN DIFFERENCES UNDER 
THE NULL HYPOTHESIS 


Maint H 


38.4 

-026 

| —.026 
— 38.4 


greater than the one obtained?” Tf this prob- 
ability is acceptably small, the null hypothesis 
is rejected. 

For the data in Table 1, there would be 
2? = 4 possible and equally likely arrange- 
ments under the null hypothesis. The mean 
differences for the four possible arrangements 
are presented in Table 2, together with their 
associated parametric 4 values, which were 
computed simply for illustrative purposes. 

Using the data in Table 2 to perform the 
randomization test, we see that a mean dif- 
ference corresponding to a ¢ value of 38.4 or 
greater would occur one time in four pz 
.25) if the null hypothesis were true. Thus the 
null hypothesis is not rejected. 

Simple enumeration will reveal that the 
component-randomization ¢ test cannot pos- 
sibly result in the rejection of the null hy- 
pothesis at the .05 level, even for a one-tailed 
test of significance, unless there are at least 
five matched pairs. In order to reach the .01 
level for a two-tailed test, even in principle, 
one must use at least eight matched pairs, or 
a total of 16 observations, 

As we have seen in our example, the para- 
metric ¢ test is subject to no such restriction, 
With a very pronounced experimental effect, 
it is possible to obtain a significant £ at any 
predesignated level of significance with only 
two pairs of observations. 

The major limitation of the randomization ¢ 
test arises when the null hypothesis is in fact 
false. Under this condition, all the variance 
due to the experimental effect is entered into 
the error term, and differences are found that 
would be extremely unlikely if the null hy- 
pothesis were true, As # increases indefinitely, 
this limitation becomes unimportant, since the 
probability of any prespecified mean difference 
other than zero will become indefinitely small. 
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However, when z is small, this limitation can 
seriously undermine the effectiveness of a com- 
ponent-randomization test, as shown in the 
foregoing example. . 

The investigator who anticipates using 4 
component-randomization test would do well 
to consider Fisher’s reservations (1960): 4 


The utility of such non-parametric tests consists in 
their being able to supply confirmation aleneva 
rightly or, more often, wrongly, it is suspected o 
the simpler tests have been appreciably injured by 
departures from normality. f , 

They assume less knowledge, or more ignorance, 0 
the experimental material than do the standard tests, 
and this has been an attraction to some mathema- 
ticians who often discuss experimentation without 
personal knowledge of the material. In induces 
logic, however, an erroneous assumption of oy 
is not innocuous; it often leads to manifest à 
surdities [pp. 48-49]. 


In summary, we can say that when the as- 
sumptions of the parametric ¢ and F tests E 
met, and when the number of observations E 
large, then the peperpe wr) 
and P tests will give results comparable is 
their parametric counterparts, particular y 
when the experimental effect is not a latge 
one. Therefore, if an obtained result is Ta 
barely significant, and if one feels p 
about the assumptions of the parametric = 
that has been made, then one may wish. i 
perform a very tedious component-randomP. 
tion test in order to allay one's fears. On ie 
other hand, if the parametric assumptions "e 
tenable, the number of observations is d 
and the experimental effect is large, one "€ 
be foolish to employ a component-random! WU 
tion test, since such a test would be likely re 
fail to reveal the true differences that we 
present. 
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The research of the young infant's re 
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sponses to visual patterns is examined, 


with an emphasis on what these responses indicate about the capacity of form 


'This review examines the evidence which 
bears on the capacity for form perception in 
the young human infant. Attention is given 
primarily to research on infants from birth 
to 6 months of age. In order to delimit this 
discussion, it is necessary to consider first 
what is included under the term “form per- 
ception.” The term has reference to both a 
particular content and a particular process. 
'The content of form perception consists of 
4 such stimuli as shapes and patterns, and 

global aspects of stimulation such as arrange- 

ment of parts and wholeness of figure. The 
content also includes the physical elements of 
these stimuli, that is, contours, angles, and 
the like, The process referred to here concerns 
methods of selecting and rejecting. informa- 
. tion or content. Of interest in this context are 
orientation toward and away from a visual 
stimulus, the character of the visual scan of 
the infant, and the possible roles of focal and 
peripheral vision. A complete discussion of 
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are treated. The review includes a pres 
as a consideration of some of the prol 
research. The main conclusion is that there seems to be no evidence to date 
which necessitates rejection of the prev: 
infant is qualitatively similar to that of the adult. 


perception in the infant. Both visual scanning patterns and discrimination data 
entation of theoretical positions as well 
blems and techniques involved in this 


ailing view that the perception of the 


processing should include not only a treat- 
ment of the selection of information but also 
such later functions as storage and retrieval 
of material, as well as a consideration of the 
structures responsible for reception, integra- 
tion, and utilization of content. However, this 
review concentrates mainly on the stage of 
information gathering, with some treatment 
of possible structures as seen in the work of 
Jerome Kagan. 


BACKGROUND 


A full appreciation of the significance of 
the advances made in the last 15 years in 
understanding the visual perception of the 
infant is impossible without a brief considera- 
tion of the old conceptions of the infant's 
visual world. Although William James’ “great 
blooming, buzzing confusion" has become a 
hackneyed phrase, it does convey a feeling 
of the lack of organization and differentiation 
that was thought formerly to characterize the 
visual world of the infant. Fantz (1958) 
asked whether the confusion is alleviated by 
the genesis of perceptual organization, that 
is, are infants sensitive to configurations of 
stimuli? 

In addressing himself to this question, 
Fantz (1956) developed an ingenious method 
for measuring visual fixation on a stimulus. 
His success in demonstrating that young 
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infants (even neonates) show preferences for 
(ie., look longer at) certain kinds of forms 
over others led him to conclude that “some 
degree of form perception is innate [Fantz, 
1961, p. 68]" in infants, for “if an infant 
consistently turns its gaze toward some forms 
more often than toward others, it must be 
able to perceive form [Fantz, 1961, p. 67]." 

The current opinion among most investi- 
pa (e.g, Charlesworth, 1968; Gibson, 
1969; Hershenson, 1967) is essentially simi- 
Jar to the conclusion reached by Fantz. The 
prevailing view recognizes that what, in the 
adr is called form perception may be a 
process qualitatively similar to the neonate's 
response to the visual world, whether or not 
one wants to designate this latter process by 
the same name. In order to decide if this is 
indeed the case, it would be useful to have 
a criterion, that is, some measure that could 
be applied at all ages by which one could 
infer a basis for form perception. Criteria 
have been offered by several investigators, one 
of the least stringent and most appealing by 
Hershenson (1967). His suggestion is that in 
order to determine whether a subject has 
form perception we must know whether he 
responds to forms as wholes. The difficulty 
with this criterion is that it is not an easy 
task to define what this means operationally, 
particularly in very young organisms with 
limited response potential. "Therefore, while 
the idea of response to forms as wholes will 
be kept in mind as a guideline in discussing 
Some of the data, the impossibility of using 
it as an absolute criterion must be realized. 
Indeed there is a possibility that no opera- 
tional measure may be entirely satisfactory. 
Instead, attention is focused on just what 
we can learn about the infant’s perceptual 
characteristics from his reactions to various 
elements and dimensions of stimuli with 
formlike characteristics. 


THEORETICAL Positions 


There are several theoretical lines of 
thought, mostly borrowed from the study of 
older Organisms, that have been or could be 
applied to infant perception. As stated above, 
the Prevailing view is that infant perception 
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may be qualitatively similar to that of the 
adult. This seems to imply that some aspects 
of perceptual functioning are given, but that 
some kind of development does take place. 
The emphasis for most theorists is on develop- 
ment, and various positions differ with respect 
to how it is characterized. On the other hand, 
the gestalt theory of form perception, while 
it does not deny that the perceptual world of 
the infant expands through experience, places 
the emphasis on what is inborn, and regards 
the organism as essentially perceptual from 
the start. 

The gestalt theory of perception (Koffka, 
1928) has long been regarded as the classic 
nativist position. For the gestaltists the prin- 
ciples of perceptual organization are direct 
and primitive, that is, perceptual experience 
is not chaotic but is organized at birth. The 
first phenomena experienced by the infant ate 
qualities, or figures, upon a ground. In this 
sense the most primitive phenomena are not 
elements in isolation but configurations whose 
parts are in constant and essential interaction. 
While the perceptual world of the organism 
grows with both maturation and learning 
form perception is given in the sense that the 
organism immediately reacts to stimuli In 
orderly relationships to one another. The ele- 
ments of perception are whole, shaped repen 
set off by contours. Patterns are constructe 
by the observer, not by gradual combinatio! 
of elements of form through associative A. 
perience, but through a dynamic and ae 
ate process of self-arrangement of fields ! 
the brain, " 

The theory of Hebb (1949) has long ns 
lated that certain features or elements of iah 
stimulus are “picked up” in units and sa 
gradually integrated into wholes through de 
sociative experience, the mechanism consist ug 
of changes in linkages and strength of see 
structures. Hershenson (1967) noted the Te 
cent renewal of interest in this E gcc 
block" idea of the development of € 
perception, and attributed the revival to 
gestions that there are, in several an 
detectors of specific kinds of stimulation e 
as contours, angles, and lines (Hubel s 
Wiesel, 1962). 


m T er 
Hershenson has called attention to anoth 
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notion based on the work of Ames and Silfen? 
in which the very young infant is character- 
ized as being "captured by the stimuli," 
whereas the older infant seems to be con- 
trolling his input or "capturing the stimuli by 
his visual behavior." Thus, in Hershenson's 
view, progress in perception of form is shown 
in the development of two parallel systems. 
One system consists of primary elements that 
eventually combine to form functionally au- 
tonomous structures, while the other is em- 
bodied in a shift from obligatory attention to 
stimuli to greater control over the stimuli 
attended to. : 

A somewhat different view of perceptual 
development is espoused by E. J. Gibson 
(1969). Rather than the development of per- 
ception consisting of the organization of ele- 
ments into wholes, for Gibson the process is 
one of differentiation of distinctive features 
from the total, presumably undifferentiated, 
visual field. The organism learns not combi- 
nations of elements but critical features and 
invariant relationships in stimulation. Gibson 
contends, as well, that information regarded 
as relatively “sophisticated ," such as stimula- 
tion indicating depth, is available to the infant 
from the start. Since the information neces- 
sary for perception is contained in the stimu- 
lus, the infant should be able to “pick up” 
any kind of perceptual information limited 
only by his capacities of differentiation. 
"Through experience the organism learns to use 
more information or more cues in stimulation 
and to differentiate “higher-order” variables. 
Much of the responsibility for perceptual dif- 
ferentiation lies with the activity of the orga- 
nism, and the infant (at least after a very 
young age) is viewed as an active seeker of 
information rather than a passive recipient 
of stimulation. 

The broadest coverage of and the most 
research on the role of activity in percep- 
tion is given by the Russian investigators 
(Zaporozhets, 1965; Zinchenko, Chzhi-tsin, 
& Tarakanov, 1962) who espouse a motor- 


8E. W. Ames and C. K. Silfen. Methodological 
issues in the study of age differences in infants 
attention to stimuli varying in movement zy in 
plexity. Paper presented at meeting of Society for 
Research in Child Development, Minneapolis, Min- 
nesota, March 1965. 
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copy theory of perception. The focus for the 
Russians is an examination of the changes in 
the quality of *perceptive images" that are 
brought about by changes in the effectiveness 
of orienting, exploratory, and modeling move- 
ments. Thus, perceptual ability depends on 
the sophistication of the individual's methods 
of gathering information, and changes in per- 
ception occur as a result of changes in meth- 
ods of activity. For our purposes, it is the 
modeling function of activity that demands 
attention. The Russians contend that the de- 
velopment of this capacity is such that the 
movements of the hand or the eye of the child 
eventually follow the outline or contour of 
the object rather than simply investigating by 
shifting from part to part. 

For Fantz (1967) perceptual development 
consists of the "acquisition of knowledge" 
about the environment. While this view is 
not in disagreement with that of Gibson, there 
is a difference in emphasis; for Gibson the 
most important point is the process of dif- 
ferentiation, while Fantz is concerned pri- 
marily with how the organism's selection ca- 
pacities influence what knowledge is acquired. 
Again, while Fantz would not argue with the 
contention that activity is important for per- 
ceptual development, he is concerned about 
making clear that acquisition of knowledge 
can take place apart from specific changes in 
response. According to Fantz, one may assume 
that information has been taken in through 
the sense whenever there is a learned change 
in response tendency, but one is not reducible 
to the other and cannot always be shown by 
the same data, Thus, perceptual learning may 
occur without the learning being shown by a 
change in overt behavior. The importance of 
visual experience is in the development of the 
capacity to receive and discriminate stimuli 
and the development of the act of attending 
to a stimulus, this latter point being reminis- 
cent of the Russian position on orienting 
activity. In terms of the mechanisms involved, 
Fantz espoused an eclectic view. Improvement 
of afferent discriminatory processes (as in 
Gibson) occurs along with better organization 
of input (as in Hebb). 

Kagan (1967) offered what he terms some 
“low level theoretical ideas” in order to orga- 
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nize his findings, particularly with reference 
to responses to faces. His emphasis is on the 
construction and alteration of internal repre- 
sentations of experience that form a basis for 
such functions as memory. Kagan admits that 
a few unique stimulus characteristics will in- 
variably attract the newborn’s attention. Thus 
the infant is equipped with a bias toward 
selecting certain kinds of information, and 
this will influence the kinds of stimuli that 
are represented. This emphasis is similar to 
that of Fantz, though Kagan is more con- 
cerned with describing the kind of representa- 
tion that takes place. Representation is in 
the form of a “schema,” a term introduced by 
Bartlett (Vernon, 1955) and used extensively 
by Piaget (1952). Kagan (1970) differenti- 
ated his schema from that of Piaget by say- 
ing that, while for Piaget the schema includes 
both the internal representation and the orga- 
nized action toward it, for Kagan the term 
has reference only to the representation, 
Beyond the few stimuli that have the univocal 
power to attract attention, the infant's atten- 
tion depends on the relationship of the stimu- 
lus to the child's schema. Maximal attention 
is elicited by stimuli representing newly 
emerging schemata and by stimuli consist- 
ing of moderate departures from established 
Schemata. At a later age, the child also 
attends to stimuli which activate hypotheses, 

When one attempts to review the literature 
with the idea of accepting or rejecting one 
or more of these theories, one encounters two 
difficulties: First, the theories are often quite 
vague as to how they differ from one another, 
that is, as to what specific predictions they 
would offer as to experimental outcome, Sec- 
ond, much of the research done in this area 
as been carried out without reference to a 
particular theory. It has been felt by many 
Investigators that we simply are not ready 
or a comprehensive theory to encompass our 
Pioneering efforts on the development of form 
Perception. For these reasons there is no 
Strict attempt to place each Piece of research 
In categories supporting or rejecting each 
theory, although, where appropriate, some of 
the studies are discussed with respect to the 


relevance they have for a certain theoretical 
Position, 


Basic VISUAL ABILITIES . 


In order to explore the ability of the infant 
to perceive form, it must first be determined 
whether the visual system of the infant is 
sufficiently mature for such perception to take 
place. This topic was reviewed by Hershenson 
(1967), and the reader is referred to this 
source for details, Hershenson's conclusion as 
to whether the infant shows the basic visual 
abilities necessary for form perception was 
that “clearly there is no evidence that Argus 
against it [ Hershenson, 1967, p. 331].” There 
has been no further research since that time 
that would alter this conclusion. 

There are two areas in which the data have 
been in some conflict. In the area of conjuga- 
tion, Hershenson (1964) found that newborns 
usually fixated the stimulus with both eyes 
whereas Wickelgren (1967) concluded that 
for the most part her neonates? eyes did noa i 
converge on the same stimulus. The fact T i 
though, that apparently newborns do have | i 
this ability to some degree, and the question E 
is "the relative frequency of convergence ke 
Opportunity to converge |Hershenson, 19 H 
D. 330]." Therefore, this presents son 
problem when working with neonates. 1 

The second area of conflict concerns v4 
commodation. Fantz, Ordy, and Udelf (196 in 
felt that since they found no differences a 
acuity at different distances, the ability 5 
accommodate for near vision is gt d 
early infancy. Haynes, White, and t 
(1965) found, however, that infants less i 
one month old did not adjust gom 
tory responses to changing target up 
Hershenson (1967) contended, mogh y 
the Haynes et al. target may not have ? A 
an adequate stimulus for accommodation r 
all distances, It is possible also that age 
Haynes et al. technique (dynamic ym 
Copy) was measuring something different f / 
What was assessed behaviorally by men 
al. (1962). At any rate, whatever the abi aa 
or lack of ability of the newborn to d 
modate, the behavioral fact is that his Ke 
acuity is quite good for targets pesci 
Within a circumscribed range of dista 
(Gorman, Cogen, & Gellis, 1967). 


| VISUAL SCANNING PATTERNS 


infants falls into two main categories. The 

rd first centers around analysis of visual scan- 
ning patterns. The second, more extensive, 

ı Category is called response to pattern charac- 
teristics. This includes studies in which rela- 
tively gross responses are employed as indexes 
of discrimination among, preference for, or 
equivalence of visual patterns. 


R The research related to form perception in 
à 


Newborns 


, One of the earliest studies of visual scan- 
ning patterns is the well-known experiment 
of Salapatek and Kessen (1966) in which 
subjects 4-7 days of age were presented with 
a large solid black triangle at a distance of 9 
inches. Subjects had both eyes open, but only 
one was photographed. It was found that the 
4 fants rapidly localized a vertex of the tri- 
j angle and executed a cyclical, mainly hori- 
zontal scan across this feature for the dura- 
tion of the exposure. The interpretation was 
that visually naive infants select only a single 
feature (an angle) for inspection. A few years 
later the same investigators (Salapatek & 
Kessen)* tested the hypothesis that the num- 
| ber of features selected could be increased 
through massive exposure of the stimulus. 
The same triangle was presented to newborns 
for prolonged periods within and across days, 
but it was found that multiple-feature selec- 
tion did not occur as a result of massive ex- 
posure alone. Some subjects, however, could 
alternate between single-feature selection and 

a well-directed multiple-feature scan. 
Another study by Salapatek (1968) sought 
to expand the investigation by looking at the 
role of size, angularity, and figure-ground 
Contrast on the (monocularly recorded) re- 
Sponse of newborns to circles and triangles. 
\ He found that 50%-70% of the newborns 
selected less than 50% of the figure for pro- 
| longed investigation. The parts of the figure 
Í Selected involved angular, circular, or linear 
Contour, but the experimenter did not ob- 


1P, Salapatek and W. Kessen. Prolonged investi- 
Kation of a plane geometric triangle by the human 
newborn, Paper presented at the mee « 
Tor Research in Child Development, Santa Monica, 
April 1969, 
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Fic. 1. Horizontal and vertical black-white split 
fields used by Kessen, Salapatek, and Haith (see 


Footnote 5). 


tain clear-cut selection of a single feature. 
Salapatek explained that in his study pulsing 
infrared lights in their visual field may have 
distracted the subjects. However, due to the 
mixed findings the characteristic of single- 
feature selection as applicable to the newborn 
is somewhat questionable. 

In another experiment (Kessen, Salapatek, 
& Haith)? newborns were presented with 
horizontal and vertical black-white split fields 
(see Figure 1), and the scanning of the right 
eye was observed. The results showed clear 
selection of the vertical transitions for pro- 
longed, horizontal-cyclical visual scanning, but 
no clear choice of the horizontal transitions. 

Kessen et al. (see Footnote 5) and 
Salapatek ° reported that, given a homogene- 
ous surface, the newborn scans broadly in the 
horizontal and vertical dimensions, but more 
broadly in the horizontal. When a figure is 
introduced into the field, the dispersion of 
the scan is greatly narrowed; according to 
Salapatek (see Footnote 6) the subject fix- 
ates a contour within 3-4 seconds. Appar- 
ently, there is something about the stimula- 
tion offered by a vertical line (brightness 
transition) and a vertex of a triangle that is 
highly compelling to an organism that pos- 
sesses a relatively good facility for executing 
a horizontal scan. It is interesting that the 
scan remains essentially horizontal even in 
the presence of the figure. One possible expla- 
nation for such an attraction to lines and 


5W. Kessen, P. Salapatek, and M. Haith. The 
visual response of the human newborn to horizontal 
and vertical linear contour. Paper presented at the 
meeting of the American Psychological Association, 
Chicago, September 1965. 

6p, Salapatys. The visual investigation of geo- 
metric pattern by the one- and two-month old infant. 
Paper presented at the meeting of the American 


Academy for the Advancement of Science, Boston, 


December 1969. 
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vertexes is the presence of neurophysiological 
coding mechanisms specifically tuned to such 
stimuli as contours and angles. Such contour 
operators have been described for the visual 
system of cats (Hubel & Wiesel, 1962) and 
monkeys (Wiesel & Hubel, 1966). 

Haith* described in capsule form the pro- 
gramming of the newborn, in which the neo- 
nate is regarded as an active organism search- 
ing in an organized manner for certain kinds 
of stimulation: (a) if alert, and light is not 
too bright, open up (eyes); (5) if eyes open, 
but see no light, search; (c) if see light but 
no edges, keep searching: (d) if see edges, 
hold and cross. He felt that several successive 
eye movements might be determined at a 
particular moment in time under the control 
of a directing strategy, that is, a kind of 
plan under which several movements are 
"programmed" for execution. 

That the newborn's scanning is at least 
partially directed is suggested by Salapatek's 
(see Footnote 6) description of how the new- 
born localizes a stimulus segment. In this 
account, we are introduced to the utility of 
peripheral vision in the information gathering 
of the newborn. Salapatek contended that 
localization often includes visual traversing 
across some portion of the figure, not selected 
for fixation, during approach to the segment 
selected for fixation. This suggests that the 
newborn possesses directionally appropriate lo- 
calizing tendencies toward peripheral stimula- 
tion and that the neonate is capable of some 
peripheral discrimination of patterns, which 
make it possible to determine which particular 
feature will be approached. 

In terms of maintenance of fixation on a 
feature, Salapatek (see Footnote 6) observed 
that, once a segment is localized, fixation is 
maintained for a prolonged period. This is 
reminiscent of studies using grosser techniques 
in which the young infant is described as 
being "captured? by the stimulus (Ames & 
Silfen, see Footnote 3; Stechler & Latz, 
1966). Just what determines a change in fixa- 
tion (e.g., habituation or some other factor), 
and whether several such changes are planned 

* M. M. Haith, Visual sca 
presented at 


Research in 
March 1968, 


nning in infants. Paper 
the regional meeting of Society for 
Child Development, Clark University, 
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in advance, are questions impossible to answer 
at the present time. 


Older Infants 


In two studies, Salapatek (see Footnote 6) 
studied the progress of visual scanning in 
infants of approximately 1 and 24 months 
of age. In the first study, subjects 4-6 
weeks (younger subjects) and subjects 8-10 
weeks (older subjects) were presented (bin- 
ocularly) with five white-outline geometric 
figures, and recordings were made from the 
right eye. Younger infants showed most fixa- 
tions toward the contour with few toward the 
center of the figure, and fixated a very lim- 
ited portion of the contour (usually an angle, 
if present). In addition, they fixated more on 
the right half of the figures. Since recordings 
were made from the right eye only, this “right 
bias” may have been due to the young in- 
fant’s difficulty in convergence. By compari- 
son, the older subjects fixated to the right 
but less so than the younger subjects and 
fixated more toward the center of the figure 
than did younger subjects, though most fixa- 
tions were still on a contour. They also fixated 
a greater portion of the contour, again usu- 
ally on angles. As with the younger subjects; 
this held for simple and "complex" ae 

The stimuli in the second study containe 
internal and/or external features (see Figure 
2). Since it has been shown that young JT 
fants tend to fixate to the right of ep" 
(with the right eye), Salapatek displaced E 
stimuli one inch to the left of the subjec 2 
central visual field. The results seem to pl 
a shift in attention toward internal rathe 
than external features between 4 and 
weeks. However, whether this is a real att 
tional preference shift or a product of ko 
maturation of the capacity to converge a 
questionable. The fact that younger infants 
tend to fixate to the right side of the figure 
with the right eye (even, at times, we 
there is no external contour present) ae 
gests the possibility that the left eye May : 
Scanning the part of the figure to the le 
(ie. in most cases, the internal features): 
It would be interesting to see what n 
happen if the figures were displaced to t 
subject’s right, 
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While the report of infants shifting their 
gaze from external to internal features with 
age appeared to represent a qualitative change 
in the kind of information to which infants 
attend, doubt is cast on this interpretation 
due to complications induced by recording 
methods and the problem of convergence of 
both eyes on a single stimulus. Therefore, it 
appears that the only well-documented change 
with age is a quantitative shift from fixation 
of a limited portion of the figure (not neces- 
sarily a single feature) to scanning of a more 
extensive portion. 

In terms of Hershenson’s (1967) guideline 
of response to a whole figure as a basis for 
form perception, one might assume that scan- 
ning of only a part of the figure indicates that 
the young infant does not “take in” the en- 
tire form. However, such a position is tenable 
only if a strict motor copy theory is adopted; 
that is, that one gathers information about 
an entity adequate for recognition only by 
active scanning of all its parts. It is just as 
likely that information can be gathered about 
Portions of figures through “passive” looking 


With little eye movement (Charlesworth, 
1968) and through peripheral selection and 
filtering, 


The data on visual scanning show us that 
an infant responds selectively to elements of 
form from birth, and that with age he ex- 
amines a more extensive portion of an entire 
figure through active scanning. What accounts 
for this quantitative change is unknown, but 
it probably depends partially on the matura- 
tion of the ability to scan with facility in 
different directions. The data further charac- 
terize the infant as an active seeker of stimu- 
lation from the start, and as an organism 
Capable of dealing with more information as 
Ne matures. These latter characteristics are 
Consonant with the views of most current 
theorists but are stated most explicitly by 
Gibson (1969). 


RESPONSE ro PATTERN CHARACTERISTICS 


. The main response used in assessing the 
‘fant’s interest in patterns has been the 
Visual fixation or visual looking response. 
Perhaps the first relatively formal use of this 
response was made by Stirnimann (1944; 
cited in Fantz 1961) who held cards up to 


Fic. 2. Stimuli containing internal and/or external 
features used by Salapatek (see Footnote 6). 


the eyes of infants,1-14 days of age and 
observed that they preferred patterned cards 
io plain colors. Fantz (1956) subsequently 
devised an apparatus for the relatively reli- 
able observation and recording of fixation. 
Other responses besides fixation have been 
used as indexes of interest in visual stimuli. 
Among these are smiling, heart rate, sucking, 
and vocalization. These are discussed in 
context as used by various experimenters. 


General Preference for Pattern 


The rather surprising discovery made by 
Stirnimann that infants prefer patterns to 
plain suríaces has been documented many 
times since, In one of his earliest studies, 
Fantz (1958), for example, observed that in- 
fants from 1 week to 6 months of age pre- 
ferred a red-and-white checkerboard to a plain 
red square. In two other studies, Fantz 
showed that newborns (Fantz, 1963) and in- 
fants over 2 months old (Fantz, 1961) pre- 
ferred black-and-white patterns (schematic 
face, bull’s-eye, and newsprint) to plain red, 
yellow, and white stimuli. Again, Fantz 
(1965) showed that subjects under 2 weeks 
old preferred pattern over gray. Similarly, 
Spears (1964) presented 4-month-old infants 
with stimuli varying in color and/or shape 
and found that shape preceded color as a basis 
for preference, at least in the cases where 
the stimuli varied in both color and form. It 
is clear, then, that infants from 1 day to at 
least 6 months of age generally prefer to look 
at patterned rather than plain surfaces. 


Circularity and Solidity 

Curvilinearity and three-dimensionality are 
two characteristics for which Fantz has 
claimed developmental transitions. Fantz 
(1958) showed that infants under 2 months 
old preferred a striped pattern over a bull’s- 
eve, whereas infants over 2 months old pre- 
ferred the bull’s-eye to the stripes. Fantz 
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(1967) reported a further investigation in 
which the patterns were }-inch-wide black line 
segments in four different arrangements— 
linear horizontal, wheel spokes, random, and 
bull’s-eye. In the bull’s-eye pattern, not only 
was the arrangement circular but also the 
lines themselves were slightly curved. Sub- 
jects under 1 month of age showed some pref- 
erence for the linear pattern, while only the 
bull’s-eye started and remained at a high level, 
and was much preferred by subjects over 2 
months old. It seems that a circular arrange- 
ment (wheel spokes) of stimuli alone, while 
it may be discriminated from other arrange- 
ments, is insufficient to produce a strong pref- 
erence or even to maintain interest with 
increasing age. 

The finding that infants under 1 month of 
age prefer a linear arrangement of lines is 
questioned by a replication of the above 
study (Fantz & Nevis, 1967) in which in- 
fants under 5 days of age showed no dif- 
ferentiation among the four patterns. The 
finding is questioned further by a study 
(Fantz, 1965) in which pattern organization 
was varied by placing black squares of the 
same total area in five arrangements on a 
white background. There was very little dif- 
ference in newborns’ fixations to stimuli ar- 
ranged linearly or randomly, although the 
most linear arrangement showed a decrease 
and the most random arrangement an in- 
crease over the first 6 months of life. A final 
study (Fantz & Nevis, 1967), using four ar- 
rangements of white squares on a blue back- 
ground, again showed no early preference for 
any arrangement. Also, at later ages (1-4 
months) the circular arrangement was no 
more highly preferred than two noncircular 
ones, again pointing to the possibility that 
infants are most responsive to curved lines 
rather than to curved arrangements of lines. 
In general, the data show a decrease in inter- 
est with age in certain patterns composed of 
Straight lines but not in others. In the deter- 
mination of attention value, the straightness 
versus the curvature of the line may interact 
With both arrangement and amount of contour. 
5 E eiae reported that subjects over 
hed to ve preferred a solid model of a 
P nis at form, while subjects under 

mths old showed a reversed preference, 
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whether the stimuli were viewed monocularly 
or binocularly. But Fantz (1967) contended 
that the flat object produced greater light 
reflectance, and, therefore, the subjects could 
have been making the discrimination on the 
basis of differences in brightness. It is dif- 
ficult, however, to account for the preferences 
shown on this basis. That there is a preference 
for solidity at a certain age is suggested by 
Fantz’s (1966) report that a textured sphere 
is looked at longer than a similarly textured 
circle by subjects from 1 to 6 months of age. 
Also, subjects over 2 months (Fantz, 1967) 
preferred to look at a patterned surface 
slanting toward them over a similar flat 
vertical surface. Gibson (1969) would say 
that the kind of information in solidity 35 
available to the infant from the start, and 
the fact that very young infants can discrimi- 
nate two-dimensional from three-dimensional 
stimuli indicates that this may be the case. 


Stimulus Change 


Stimulus change seems to be one stimulus 
characteristic that has great power to attract 
the attention of the young infant. Ames and 
Silfen (see Footnote 3) showed moving anc 
stationary checkerboard designs to subjects 
7-24 weeks of age. The fact that the younger 
subjects seemed to be "captured by the 
stimuli? was especially apparent for moving 
displays. Haith (1966) showed newborns 
(3-5 days) intermittent changing stimu” 
consisting of sequentially illuminated lights 
and observed that the rate of nonnutritive 
sucking was clearly suppressed by & 
stimulus alteration. voii 

Cohen (1969) found differential ear 
responses to differentially changing stimu 
His infants fixated longer on lights of ue 
mediate change (illuminated across four H 
sitions) than on lights with greater (8 E. M 
positions) or lesser change (stationary)- am 
second experiment, Cohen repeatedly P d 
sented two lights simultaneously and weet 
that infants preferred changing to station rly 
lights, with the greatest preference on 
trials for a light changing among 4 posit aE 
and on later trials for a light changing p 
16 positions. These studies, therefore, x 
that not only is the infant attracted ; 
changing stimuli but that his response 
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differential depending upon the amount of 
change and his experience with the task. 

A final study (Tauber & Koffler, 1966) 
shows that infants are also responsive to ap- 
parent motion. These investigators demon- 
strated that subjects from 10 hours to 4 days 
old reacted with optokinetic nystagmus to the 
apparent movement, simulated through stro- 
boscopic flashes, of a striped field. They con- 
cluded that optomotor response to apparent 
motion is innate in humans. 

A certain rate of movement of a stimulus 
across the retina (or even simulated move- 
ment) seems to act, in conjunction with eye 
movements, to provide an optimal stimulus 
for vision. 


Novelty 


The novelty-familiarity dimension can be 
viewed as a function of stimulus repetition 
or the length of time a stimulus is exposed, 
and it is in this sense that it is used here. 
The literature on novelty is extensive, and 
therefore studies are selected to indicate the 
main trends and the most significant aspects 
of the data. Studies in which familiar and 
Novel stimuli are used are important from at 
least three points of view. 

First, the phenomenon of response decre- 
ment to a repeated stimulus may be indica- 
tive of an early cognitive process (Lewis, 
1970). That such a process is involved is 
Suggested by the positive correlation obtained 
by Lewis, Goldberg, and Campbell (1969) 
between rate of decrement and performance 
on concept formation and discrimination- 
learning tasks for subjects 44 months old. 
Such demonstrations have not been made, 
however, with very young infants. The inter- 
Pretation that a central process is involved is 
feasible only in cases where satiation due to 
receptor or effector fatigue can be ruled out 
(Lewis, 1970). In order to discount a factor 
Such as general fatigue, it would seem neces- 
Sary that the paradigm used show response 
decrement to a familiar stimulus relative to 
a novel one presented simultaneously, or show 
recovery of responding to à novel stimulus 
after habituation to another stimulus has 
taken place, or at least show that the decre- 
Ment is a function of some stimulus dimen- 
Ston, Even these precautions, however, do not 


necessarily rule out sensory fatigue. There- 
fore, while the interpretation of response 
decrement as a cognitive process is quite 
plausible, it requires further documentation. 
This approach also may show something 
about the memory of the infant. The rationale 
is that, if an infant decreases his response 
to a stimulus that is repeated, he must in 
some sense recognize it as something he has 
seen before, that is, remember it. 

While there is some negative data (Fantz, 
1964; Meyers & Cantor, 1966, 1967), in- 
vestigators have shown in general that habitu- 
ation to a repeated stimulus occurs in in- 
fants and young children from 2 months of 
age (Caron & Caron, 1968, 1969; Fagan, 
1970; Fantz, 1966; Lewis & Goldberg, 1969; 
Pancratz & Cohen, 1970; Saayman, Ames, & 
Moffett, 1964). A recent study by McGurk 
(1970) has shown this phenomenon in a small 
group of infants from 6 to 12 weeks old. The 
evidence that newborns may show habituation 
(Friedman, Nagy, & Carpenter, 1970) is 
suggestive, since response decrement for the 
two sexes showed an interaction with the 
pattern used. Several studies have shown 
greater response decrement with older as com- 
pared to younger subjects (Ames;* Fantz, 
1964). Unfortunately, the results of another 
study yielding data to this effect (Lewis et al., 
1969) are weakened because reaction to the 
familiar stimulus was not measured relative 
to response to a novel one. 

As for the young infant's memory capacity, 
Pancratz and Cohen (1970) showed that 4- 
month-old males recognized a familiar stimu- 
lus after a 15-second interval, but not after 
5 minutes; however, the investigators felt 
that the effect may have been obscured by 
fatigue. Fagan (1970) found, on the other 
hand, that subjects 21 weeks old recognized 
a stimulus after a 2-hour interval; and sub- 
jects 13, 16, and 18 weeks old showed at- 
tenuation of the novelty effect when the same 
problem was presented over successive days, 
indicating that the subjects may have recog- 
nized the stimuli even after 24 hours. 


sE. W. Ames. Stimulus complexity and age of 
infants as determinants of rate of habituation of 
visual fixation. Paper presented at meeting of the 
Western Psychological Association, Long Beach, 
California, April 1966. 
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A second source of information contained 
in studies using repeated presentations of 
stimuli concerns the effects of stimulus dimen- 
sions. Caron and Caron (1968, 1969) found 
that when infants 3.5 months of age were 
repeatedly shown checkerboards of different 
complexities (different amounts of contour) 
the amount of fixation decrement was in- 
versely related to the complexity of the stimu- 
lus (2X 2» 12x 12524 x 24). Ames (see 
Footnote 8) found that infants 11 weeks old 
habituated to repetition of an 8 x 8 checker- 
board, while subjects 5.5 weeks old did not; 
the older subjects did not, however, habituate 
to repetition of a 24 x 24 checkerboard. In- 
fants of both ages showed similar decrement 
magnitude when presented with stimuli that 
were "simple" relative to preferences shown 
by infants of younger and older ages (2 x 2 
check for 5.5-week-olds and 8 x 8 for 11- 
week-olds). Thus, it appears that the magni- 
tude of decrement depends on both the age 
of the subject and the complexity (synony- 
mous here with amount of contour) of the 
stimulus. Some of the failures to obtain re- 
sponse decrement are probably due to the 
utilization of stimuli that are too complex 
for the infants observing them. 

Another stimulus factor has been shown to 
influence recovery from habituation. McCall 
and Melson (1969) showed that for 5.5- 
month-old males the magnitude of cardiac 
deceleration was larger to a moderate trans- 
formation (rearrangement) of a repeatedly 
presented standard than to greater departures 
from the standard, where the degree of de- 
parture was determined apparently through 
adult judgments. McCall and Kagan (1967b), 
using a long-term familiarization procedure, 
showed similar results for 4-month-old girls 
with measures of cardiac deceleration but not 
for fixation times. These investigators con- 
cluded that response to a novel stimulus may 
be a function of the degree of discrepancy 
between the new stimulus and the familiarized 
one. The shape of this function is somewhat 
questioned by the results of McCall and 
Kagan (1970) showing, in 4-month-olds, an 
increasing monotonic relationship between the 
amount of change (replacement of parts) in 
the standard and the increase in fixation from 
last repetition of the standard to presentation 
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of the transformation. The results may de- 
pend somewhat on the measure used and the 
kind of transformation that takes place. 

A final source of interest in studies using a 
familiarity-novelty paradigm stems from their 
sensitivity to the discriminatory capacities of 
the infant. By experimentally attaching a 
certain degree of familiarity to a stimulus and 
having the subject respond to this familiar- 
ity, one shows that the stimuli themselves are 
discriminated. For example, Saayman et al. 
(1964), using a novelty paradigm, showed 
that 3-month-olds could discriminate a circle 
from a cross, whereas Fantz (1958) had 
shown no preference for one over the other. 
McGurk (1970) used three different proce- 
dures, a Fantz-type preference paradigm and 
two familiarity-novelty methods, with three 
age groups of infants. He found discrimina- 
tion of objects in different orientations (0 
degrees and 180 degrees) only when using the 
familiarization procedures and showed such 
discrimination in infants as young as 6-12 
weeks. 


Complexity versus Contour 


Many studies have attempted to relate the 
infant's responses to visual stimuli to the 
complexity of the patterns used. 

The initial finding was that infants gen- 
erally prefer patterns with the greatest 
amount of contour (Berlyne, 1958), a find- 
ing which led to the suggestion that the m 
fant responds to the dimension of complexity 
and prefers the most complex stimulus offeret- 
Various definitions of complexity have been 
used subsequently. In 1966, Fantz, Spp. 
ently viewing complexity as the number 0 
elements in the stimulus, found that subject 
less than 1 week to 6 months of age pe 
the youngest subjects) preferred the gos 
complex stimuli of two schematic faces, tw 
ovals with two spots each (one in eye PO 
sition) and plain ovals. Hershenson (19 a 
however, defined complexity as the numbe 
of light-dark transitions in a aaro 
stimulus and found that subjects 2—4 pon 
old preferred the least complex pattern. Lm 
situation was at least partially clarified, hov 
ever, by the study of Brennan, Ames, a 
Moore (1966) showing that preference fo 
complexity was age related. Their 3-week-o 


f 


i 
a 
^. 


subjects preferred 2 x 2 checks, 8-week-olds 
looked longer at 8 x 8 checks, and the oldest 
Subjects (14 weeks) preferred the most 
complex pattern (24 x 24 checks). 

Using a physiological technique, Harter 
and Suitt (1970) obtained visually evoked 
Cortical potentials from one human infant 
from 21 to 155 days of age in response to 
checkerboard-patterned light flashes. The 
largest amplitude responses were evoked by 
relatively large checks during the first month 
of life and by progressively smaller checks as 
the infant matured. Since the size of the 
checks varies inversely with the number of 
light-dark transitions for a checkerboard pat- 
tern, these data generally support the findings 
of Brennan et al. (1966). 

Thomas (1965) used another method of 
defining complexity. He showed stimuli of 
four complexity levels, defined by the judg- 
Ments of nine students, to younger (2-14 
Weeks) and older (15-26 weeks) infants. It 
Was found that the distribution of visual fixa- 
tions among differentially complex stimuli im- 
Posed an ordered relationship on the stimuli 
and that there was some support for the hy- 
Pothesis that the older subjects would prefer 
More complex stimuli than would the younger 
Subjects. Attneave (1957) had adults rate 
Shapes for complexity on a 7-point scale and 
Compared the judgments with physical charac- 
leristics of the shapes. He found that the 
Number of turns in the stimuli was the best 
Predictor of complexity judgments. 

One would hope, then, that number of 
turns might also be related to infants’ pref- 
erential rankings of stimuli, but the findings 

ave been mixed. On the positive side, 
Hershenson, Munsinger, and Kessen (1965) 
Presented newborns with pairs of random 
Shapes varying in number of turns (angles) 
and found looking time to be an inverted 

"Shaped function, with 10-turn figures pre- 
€rred over 5- and 20-turn figures. With con- 
Siderably older subjects (9-41 months), 
Munsinger and Weir (1967), using figures of 
5, 10, 20, and 40 turns, found that preference 
Was an increasing monotonic function of 
Complexity. A developmental shift in prefer- 
ence for complexity, defined by number of 
turns, could be predicted, with the older chil- 

ren preferring a larger number of turns. 
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For children between the neonatal stage and 
9 months, one would hypothesize preference 
for a relatively large number of turns, but 
perhaps not the largest in the series. Spears 
(1964), however, showed 4-month-olds stim- 
uli varying in shape and color and found that 
neither contour, nor number of turns, nor 
symmetry resulted in an ordering of the 
stimuli consistent with a preference response; 
one stimulus, a bull's-eye, was preferred to 
other patterns. However, since a strong pref- 
erence for bull's-eyes has been demonstrated 
about this age, it is quite possible that this 
preference outweighed the other variables. In 
another study, Spears (1966) presented 4- 
month-olds with stimuli varying in number of 
turns and also in color. He showed no prefer- 
ence for any of the five regular polygonal 
shapes, but all Spears! figures may have been 
too simple—they all had fewer turns than the 
number of turns most preferred by newborns. 
More damaging evidence to the hypothesis 
that complexity as defined by number of turns 
is an important determinant of infant atten- 
tion comes from the study of McCall and 
Kagan (1967a). They showed random shapes 
of 5, 10, or 20 turns to 4-month-olds and 
found that neither number of fixations nor 
fixation time was related to number of turns; 
these measures were functions, however, of 
the amount of contour in the figures. 

Two complexity studies have used changing 
lights as stimuli. Cohen (1969) showed in- 
fants a light that remained stationary or 
changed among 4, 8, or 16 positions. He found 
that subjects fixated longer on a stimulus with 
intermediate change, that is, on a four- 
position. light. Haith, Kessen, and Collins 
(1969) showed stimuli of three complexity 
levels, defined by varying the predictability 
of direction of light alteration, to subjects 
2-4 months of age. They found that the suck- 
ing response was consistently suppressed by 
presentation of the stimuli, but that limb 
movement was suppressed by the simplest and 
most complex stimuli and facilitated by the 
intermediate level. 

As for generalizations regarding the data on 
complexity, it seems best to conclude that 
there is no apparent stimulus dimension that 
will account for the preferences shown, This 
conclusion is based on the conflicting nature 
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of the results and the fact that so many 
different definitions of complexity have been 
used. 

A related line of research in which amount 
of contour is considered to be the significant 
variable in determining looking at patterns 
has been pursued by several investigators. 
The amount of contour is obtained by sum- 
ming the lengths of the light-dark transitions 
horizontally and vertically over the whole 
pattern. Recall that in Berlyne's (1958) early 
study the preferences shown were for the two 
patterns with the greatest amount of contour. 
As was noted above, also, McCall and Kagan 
(1967a) found, at 4 months, an inverted 
U-shaped relationship between fixation time 
and length of contour in a set of achromatic 
meaningless designs. McCall and Melson 
(1970) in addition have found fixation to be 
a function of the contour length in arrange- 
ments of squares, at 5 months of age. 

Karmel (1969a) showed that infants 
68-148 days old spent more time fixating a 
pattern with a greater amount of contour 
whether this was arranged in a random or 
redundant fashion. Preference decreased, 
however, as the amount of contour became 
very great. These findings were verified by 
Karmel (1969b) with 13- and 20-week-old 
infants, using patterns with black and white 
elements of four different sizes (3, 1-, 1-, 2- 
inch checks) and two arrangements (random 
and redundant). The best description of look- 
ing behavior was an inverted U-shaped func- 
tion of preference to the square root of the 
amount of contour. Older subjects preferred 
patterns with greater amounts of contour, so 
that the U-shaped function shifted upward 
with age. Karmel calculated the amount of 
contour in the checkerboard patterns used by 
Brennan et al. (1966) and Hershenson 
(1964), using the data to fill in his family of 
U-shaped functions at younger ages. He con- 
cluded that his data were consistent with 
those of Hershenson and Brennan et al. 

Karmel, White, Cleaves, and Steinsiek ? 


ES Z. Karmel, C. T. White, W. T. Cleaves, and 
E. E Steinsiek. A technique to investigate averaged 
oked potential correlates of pattern perception in 
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have pursued this work with the same pat- 
terns presented as light flashes, measuring 
averaged evoked potentials. Their preliminary 
findings show that the physiological responses 
are described by functions very similar to 
those obtained behaviorally. These data are 
in essential agreement with those of Harter 
and Suitt (1970). The physiological data lend 
support to Karmel's interpretation that his 
behavioral preferences are related to neuronal 
activity of cells responsive to contour infor- 
mation in the visual system. Such cells have 
been investigated (Hubel & Wiesel, 1962; 
Wiesel & Hubel, 1966) in other species. The 
peak point of the U-shaped function may 
represent the model receptive field size charac- 
teristic for the species at that specific age, the 
size being related in some optimal fashion to 
the acuity level of the organism. This work 
provides perhaps the best specification of a 
characteristic of stimulation to which the in- 
fant is responsive, and to which the response 
changes over time, whether or not one wants 
to call amount of contour by the term 
"complexity." The fact that similar findings 
are obtained using different techniques and 
different patterns lends additional weight tO 
its validity. 


Faces 


It is a common observation that infants 
manifest a great deal of interest in the human 
face. The question of relevance for form pet 
ception is what characteristics of the face 
stimulus are responsible for the interest r 
elicits and for the differential responses to A 
at different ages, Is the human face pee. 
as a whole or are the aspects attended ! 
merely features or combinations of features 
In trying to answer these questions tend 
gators have used several measures—smiling: 
fixation, and cardiac response rate. -— 

In terms of response to the face, the lin - 
to be pursued are two: an attempt to ped 
mine the stimulus for elicitation of interes 
in the face, and a discussion of the correspon” 
dence or lack of correspondence among som 
of the responses used to measure reaction n 
the face. E. J. Gibson (1969) provided 4 
analysis of the data on faces from biat 
7 months of age in terms of the stimu n- 
characteristics to which the infant is resp? 
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sive at various ages. The following dis- 
cussion is guided somewhat by her grouping 
Of studies, but includes evidence not in 
agreement with her interpretations. 

In spite of the finding of Spitz and Wolf 
(1946) that infants less than 20 days old did 
not smile at a human face, others have shown 
that infants from birth to 1 month of age 
will respond (fixate the stimulus or show 
increased arousal) to a live face (Stechler & 
Latz, 1966) or a schematic face (Fantz, 
1963). It is impossible to ascertain from these 
Studies, however, the aspects of the stimuli 
that attracted attention. Several studies 
have found no preference in neonates for dif- 
ferent arrangements of photographic faces 
(Hershenson),'? drawings of faces (Hershen- 
Son, Kessen, & Munsinger, 1967), or sche- 
matic faces (Fantz, 1966). Feature arrange- 
ment may be too subtle a variable to be 
discriminated by the newborn. Wolff (1963) 
found, however, that 3-week-olds look at the 
eyes in a real face. This may represent the 
beginning of what Gibson calls feature dif- 
ferentiation of the face. Whether or not the 
eyes are attended to as a facial feature is 
questionable. While their attractiveness may 
be similar to that of any contrasting or 
moving object, Carpenter, Tecce, Stechler, 
and Friedman (1970) concluded that the be- 
havior of infants as young as 2 weeks toward 
their mothers’ faces suggested the influence 
Of past associations. Therefore, it is a pos- 
Sibility that some aspect of the face has 
Acquired meaning even at this young age. 

Wolff (1963) reported that, from 1 to 2 
months of age, smiling is elicited by nodding 
Of the face and eye-to-eye contact with a 
human face. However, Ahrens (1954) found 
that at this age simple dot or angle patterns 
Were also sufficient to produce smiling. Salzen 
(1963) found, with one infant, that a rotated 
black and white sectored disk elicited smiling. 

herefore, while Wolfi’s subjects may have 
Deen reacting to facial features per se, it Is 
Unnecessary to draw this conclusion. W hile 
Smiling is usually assumed to have affective 
Overtones, at a young age, simple meaning- 


"^ M. Hershenson, Form perception in the human 
newborn, Paper presented at the Second Annual 


Ymposium, Center for Visual Science, University of 
Ochester, June 1965. 


less stimulus characteristics are apparently 
sufficient to elicit this response. 

Ahrens (1954) reported that between 2 and 
3 months of age, as at younger ages, the eyes 
still elicit prolonged smiling even when the 
face beneath them is blank; therefore, very 
simple stimuli continue to give rise to this 
response. However, Watson (1966) also re- 
ported discrimination of orientation of real 
faces at 0, 90, and 180 degrees for subjects 
14 weeks old, with more smiling at the up- 
right faces. That the eyes may be the effective 
feature is suggested by Fantz (1967) who 
found that infants from 2 to 3 months of 
age look longer at two dots in eye position 
than in another position. This suggests, fur- 
ther, the possibility that infants at this age 
may prefer certain arrangements of features, 
but the findings on this point are inconclusive. 
While Fantz (1967) concluded that subjects 
from 2 to 3 months fixate more on a regular 
schematic face than on a scrambled one, 
Koopman and Ames (1968) could not repli- 
cate these results, and Fantz and Nevis 
(1967) obtained such a preference only at 20 
weeks of age. Fagan (1970), however, found 
that 13-week-old subjects could discriminate 
among three arrangements of a set of non- 
facial stimuli consisting of four black squares 
on a white background. 

A number of studies have compared regular 
faces with scrambled versions of the same face 
with subjects at slightly older ages, with 
somewhat diverse results. Kagan and his 
associates have found that at 4 months regu- 
lar faces are differentiated from scrambled, 
but the results depend on the response mea- 
sure used. Kagan (1967) reported that 4- 
month-old subjects show no difference in total 
fixation time, first fixation time, or vocaliza- 
tions to regular versus scrambled three- 
dimensional sculptured faces painted flesh 
color, whereas the subjects smiled more to 
the regular face. Kagan, Henker, Hen-tov, 
and Lewis (1966) replicated these results, also 
using 4-month-olds, and, in addition, found 
that large decreases in heart rate were more 
frequent to the regular face. 

Kagan (1967) reported another study 
showing a dependence on the response mea- 
sure. He found that infants at 4 months 
showed equal total and first fixations to a 
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photo of a man's face versus a schematic 
drawing of a face, but significantly more 
smiling and greater cardiac deceleration to 
the photo. Haaf and Bell (1967) showed that 
subjects of 4 months gave a transitive re- 
sponse (fixation) ordering of four stimuli 
concomitant with the resemblance of the 
stimuli to the human face, with more fixation 
to the more facelike. The Haaf and Bell study 
is in agreement with Kagan with respect to 
preference for the more realistic face, but is 
divergent with respect to the response measure 
used to show it. Wilcox (1969), too, found a 
fixation preference for a photo of a face versus 
a realistic drawing of a face in 16-week-old 
subjects, but the drawing contained more gray 
and less black and white, so the results could 

be due to differences in contour contrast. 
By 4 months of age or before, then, the 
infant can discriminate feature arrangement, 
although this discrimination may be based on 
the positions of only one or more of the 
features. That this may be the case is sug- 
gested by Ahrens’ (1954) finding that the 
mouth was differentiated as a feature only 
around 5 months. A longer time may be re- 
quired for some of the more subtle facial 
features to be discriminated. At 6 months of 
age a male face is differentiated from a fe- 
male face, as shown in terms of vocalizations 
(Kagan & Lewis, 1965). Wilcox and Clayton 
(1968) found no differences in response to 
facial expression at 5 months of age, a finding 
in agreement with Ahrens (1954) who was 
unable to find differential smiling to different 

facial expressions before 7 months of age. 
, ln sum, the infant’s initial response to faces 
is adequately explained on the basis of 
Tesponse to one or several characteristics 
(features) of the stimulus. There is no evi- 
dence that feature arrangement is discrimi- 
nated by newborns, although by 2 months of 
age there are some inconclusive data showing 
preferences for a regular face. At 4 months 
of age most studies show preferences for a 
regular face, with regard to some response 
measures. It is not clear what this preference 
E It does mean that arrangement of 
Er oe, although it does not 
heen difieren ang pe Pag the features have 
benideon ones ; as the discrimination could 
asis of one or several features. 
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That a regular face is preferred indicates the 
possibility that the face has acquired some 
meaning, that is, positive associations, and 
is recognized as a familiar stimulus. Carpenter 
et al.’s (1970) data suggest further that this 
may occur long before 4 months. 

Kagan (1970) contended that over time a 
face schema (internal representation) is built 
up, and the infant’s responses to faces are 
in relation to this schema. Maximum atten- 
tion is elicited by stimuli representing emer- 
gent schemata and moderate violations of 
such schemata. While Kagan believed that 
the development of a schema underlies differ- 
ential response to faces, Gibson (1969) felt 
that the schema does not underlie discrimina- 
tion but follows it. Gibson contended that the 
primary process is selection of basic stimulus 
characteristics, followed by progressive feature 
differentiation, by relationships between the 
features, and, finally, by characterization of 
the whole array as distinct from other 
arrays. This does not preclude attachment of 
“meaning” to one or several features prior 
to differentiation of the whole array. Let it 
be said simply that Gibson's interpretatio? 
of a schema following rather than preceding 
perceptual selectivity would seem to be the 
more economical interpretation of the data 
to date. 


Gestalten and Organizational Features 


Bower (1965) studied three gestalt n 
minants of “perceptual unity," defined as ye 
fact that objects of perception are seen e 
unitary coherent wholes. For each oe at 
nant, common fate, proximity, and good i 
tinuation, a film sequence showed a stimu a 
transformed in a consonant manner and that 
contradictory manner. It was expected k di- 
the latter would elicit more “surprise,” de 
cated by a greater decrement in sucking- ce 
found that common fate was an "m 
principle at the youngest age (4 weeks) w- 
the others only later, at 20-30 weeks. o 
ever, an attempted replication of the pm e 
fate results with subjects 10 weeks old mi as 
results in the opposite direction from ^ 
obtained with 4-week-olds. 

Bower (19663) tested the law of i 
geneous summation, that is, that a whole to 
equal to the sum of its parts, as oppose 


hetero" 
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the gestalt law that the whole is greater than 
the sum of its parts. He conditioned a left 
head-turn response in infants 8, 12, 16, and 
20 weeks of age to a figure containing three 
parts (conditioned stimulus). Subjects were 
then tested for generalization of the response 
to the three parts separately. Only at 20 
weeks was the response to the whole greater 
than to the sum of the parts; at 8 and 12 
weeks responses to whole and addition of 
responses to the parts were almost identical. 
However, most of the response to parts, espe- 
cially at a younger age, is accounted for by 
response to a circle (part of the whole). The 
whole and the circle part were somewhat 
three-dimensional, that is, they were two- 
dimensional figures raised from a background, 
whereas the other parts were flush with the 
background. This would seem to render the 
data, particularly at a younger age, question- 
able. In a further test of the opposite propo- 
sition that parts could be added to produce 
a whole that was functionally equivalent to 
their sum, Bower found, at 10 weeks, no 
difference between whole and parts; however, 
at 20 weeks the responses to the parts were 
much more than to the whole. 

In general, the contradictory nature of 
these findings renders these data inconclusive, 
and does not provide consistent support for 
either a gestalt position or a position similar 
to Hebb’s, which would seem to predict that 
response to whole and addition of responses 
to parts would be equal. 

In another study by Bower (1966b), the 
conditioned stimulus was a wire triangle with 
an iron bar in front of it. Test objects were 
a wire triangle plus three modified versions 
of it (see Figure 3). The fact that the 50-60- 
day-old subjects showed most response to the 
full triangle seemed to indicate that they had 
Seen the conditioned stimulus as à triangle 
With a bar over it, that is, that they had 
filled in or completed the figure. It is inter- 
esting to note that when the above experiment 
Was repeated using slides of the stimuli, there 
Was no preference for one test figure over any 
Other; apparently, subjects needed the infor- 
"ation available to a mobile organism viewing 
a three-dimensional array. Perhaps Bowers 
other findings would have been more con- 
Sistent had all three-dimensional stimuli been 
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Fic. 3. Triangular figures used by Bower (1966b). 


used. As stated previously, there is some evi- 
dence to support Gibson's (1969) position 
that the kind of information in solidity should 
be available from the beginning, and Bower's 
data suggest that such information may be 
primary. 

Also along gestalt lines is the study of Lang 
(1966), which measured infants' reactions to 
such stimulus features as “good” (circular) 
and “bad” (irregular) forms and continuity 
(solid line) versus discontinuity (dotted line). 
Responses were ratings along a 9-point scale 
of pleasureful relaxation versus aversive ten- 
sion. There were no differences shown at 8 
weeks of age, but at 10 weeks subjects were 
more relaxed toward “good” forms and tense 
toward “bad” forms, though there was still 
no difference toward the continuity-discon- 
tinuity figures. The latter perhaps indicates 
again the subjects’ tendency to “fll in" or 
complete forms, while the former is in line 
with the infant's strong preference for some 
circular stimuli at this age. 

A final experiment on perceptual organiza- 
tion deals with the organization of the re- 
sponse of visual tracking in subjects 42-133 
days old. Nelson (1968) presented repeti- 
tions of a left-to-right sequence of six lights. 
When the subject was judged to be tracking 
the lights, the sequence was interrupted, that 
is, two adjacent lights did not go on. The 
tracking response took some time to organize, 
but after a few sequences, tracking began 
with the leftmost lights, and, as the session 
proceeded, there was a greater and greater 
tendency for a subject to "hit" (fixate) the 
last light even after interruption. The results 
are interpreted as supportive of Mandler's 
(1964) “completion tendency": once an orga- 
nized response has been interrupted, it is 
assumed that a tendency to completion per- 
sists as long as the situation remains essen- 
tially unchanged. One could also look at the 
results as indicative of the gestalt principle of 
“filling in" or completion of the stimulus, 
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These studies have shown that between 
approximately 6 weeks and 5 months of age 
infants display some of the gestalt character- 
istics of form perception, though many of the 
data are inconclusive. The data are most 
strongly in support of an early completion 
tendency and suggest further the importance 
of three-dimensionality in the perception of 
the young infant. 


RESPONSE TO VISUAL ParTERNS— PROBLEMS 
AND TECHNIQUES 


Multiple Response Measures 


As has Been observed in a number of studies 
(e.g., Kagan, 1967; Kagan et al., 1966), at a 
given age one response measure may show 
differentiation among stimuli, while another 
may not. Differential results have been found 
even among various fixation measures. Lewis, 
Kagan, and Kalafat (1966) found that first 
fixation responses result in greater differentia- 
tion among stimuli than do total fixation mea- 
sures and that there is no positive, and some- 
times a negative, relationship between number 
of fixations and total fixation. These measures 
may be tapping different segments of the at- 
tentive process. The same response measure 
also may change in meaning over time. Kagan 
(1970) contended, for example, that an in- 
fant may fixate a particular stimulus at 4 
months of age because it is optimally dis- 
crepant from some schema, and at 1 year 
of age because the stimulus activates cer- 
tain hypotheses. All of these considerations 
argue for the recording of multiple response 
measures insofar as possible. 


Indications of Lack of Preference 


The fact that the usual visual preference 
techniques have the disadvantage of providing 
information only if preferences do occur has 
been noted by a number of investigators 
(Gibson & Olum, 1960; Saayman et al, 
1964). There are several ways of augmenting 
the information yielded by these studies, in 
terms of increasing the chances of obtaining 
evidence of discrimination. 

One method is to attach experimentally 
Some property such as differential familiarity- 
novelty to the stimulus as demonstrated in 
the study of Saayman et al. (1964). By 
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reacting differentially to the degrees of nov- 
elty possessed by different stimuli, subjects 
give evidence of discrimination of the stimuli. 
Another method of increasing information 
about discrimination, as we have seen, is to 
use multiple response measures; one mea- 
sure may show differences to stimuli, while 
another may not. A third device is utilization 
of a learning procedure, such as that used by 
Bower. Here, since there is reinforcement in- 
volved, there is an advantage in making the 
"right" response to the stimuli perceived as 
"correct" (in terms of belonging to a par- 
ticular class). However, this is sometimes à 
relatively fatiguing procedure and in some 
cases requires specific developments in mus- 
culature, and thus its application is limited. 


Basis for Discrimination 


Even if a preference does occur, according to 
Gibson (1969), it *seldom gives firm evidence 
of exactly what it is that has attracted the 
infant’s attention [p. 324]." A real question 
can be asked as to what the subject is looking 
at, that is, on what basis he is making the 
discrimination. This is the problem of stimu- 
lus control, the pinpointing of variables tO 
which the subject can and does respond. 
Hershenson (1964) showed, for example, that 
newborns can discriminate stimuli of differ- 
ential brightness with patches of intermediate 
brightness preferred to brighter and dimmer 
ones. Thus, brightness, or any phenomenally 
dissimilar aspects of the stimulus situatio" 
could serve as a basis for discrimination. 

The basis for a discrimination can 5 
determined in several ways. One may g 
some information on this issue from the € 
of multiple responses recorded at the sam" 
time, although this seems like quite an iM 
perfect method, particularly with the curren’ 
confusion over what various responses mew 
sure. Another method is to record differer 
kinds of responses to equivalent stimuli ‘ 
separate experiments. For example, Karn 
(1969) obtained the same kind of functio” 
recording behavioral and physiological 
sponses to differentially contoured patterns: E 
would be difficult to record these simultant” 
ously since the stimuli are presented in " 
ferent Íorms— pairs of static patterns vers 
patterned light flashes, Another technid 
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that emphasizes the kind of function ob- 
tained consists of trying to obtain a correla- 
tion between some aspect (e.g., duration) of 
a response and value along some variable 
dimension of a display (e.g., the amount of 
contour in the pattern). Hershenson (1967) 
referred to this as obtaining a transitive 
ordering of response; he contended that only 
When the responses constitute an ordered set 
can the effective stimulus be specified. He 
lamented the fact that we have made so little 
progress toward definition of dimensions along 
Which responses can be ordered. A final 
technique for determining what the subject 
is looking at is to observe orienting responses 
Correlated with contour or fixation on par- 
ticular regions, in other words, to record, via 
Sensitive techniques, the exact part of the 
array to which the infant’s gaze is directed, 
às well as the patterns of his looking behavior. 
The visual scanning studies using this tech- 
nique were described earlier. 


Problems Related to Basic Capacities 


An additional problem in recording re- 
Sponses to visual patterns concerns their rela- 
tionship to the basic visual capacities of the 
infant. Since there is some question as to how 
frequently the newborn fixates binocularly, it 
is probably best at present to circumvent the 
Problem by recording monocularly. However, 
one then must consider the difference it might 
make in terms of perceptual integration 
Whether the infant receives binocular or 
Monocular input and the percentage of the 
time he receives one or the other. Thus, 
Hershenson (1967) asked whether fusion of 
information could occur with partially cor- 
related input. That recording from one eye 
may be sufficient for some purposes, however, 
is suggested by the discrimination data of 
Miranda (1970). His observations of the 
Visual preferences of newborns were made 
mainly from recordings of the right eye, yet 
the same preferences were obtained from re- 
Sponses to contralateral as to ipsilateral far- 
Sets. Also to be considered are the questions 
Concerning the flexibility of the accommoda- 
lory process of newborns, particularly when 
Using moving stimuli. Fortunately, the matu- 
Tation of the visual system makes great prog- 
Tess in the first few months, so that these 
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problems are largely irrelevant, within limits, 
after a very young age. 


Nonattentive Behaviors 


A final point considered in discussing pat- 
tern preference techniques concerns what this 
writer considers to be an important lack in 
the data. We are told in some cases the 
amount of time the subjects spend looking at 
the stimuli and the portion of looking time 
devoted to one stimulus versus another, but 
in many cases the time spent in looking at the 
stimuli is far less than the total time for 
which the stimuli are presented. An important 
factor is what the infant is doing when he is 
not looking at the stimulus. Stechler and Latz 
(1966) found that infants at a certain age 
tended to show withdrawal behavior (turning 
away) when presented with a real face. 
Carpenter et al. (1970) measured not only 
looking time but also such nonattentive be- 
haviors as looking away, peripheral viewing, 
and closing eyes. They found that infants 
1-8 weeks old looked less at and showed more 
nonattentive behavior to presentation of the 
mother’s face over presentation of a dummy- 
model face or abstract three-dimensional 
stimulus. Further investigation into the char- 
acteristics of patterns of such behaviors could 
help to elucidate the nature of the active con- 
trol over stimulus input shown by the infant. 
It would be interesting if these measures 
could be obtained using nonsocial stimuli as 
well. 


DISCUSSION 


The data reviewed give us some informa- 
tion about the perceptual characteristics and 
abilities of the young human infant. How 
much the data actually tell us about whether 
or not form perception is given at birth is de- 
batable. With our present state of knowledge 
and technology, it seems best to conclude that 
the answer is a matter of definition, as Fantz 
(1967) suggested. If we demand that the 
neonate respond to a stimulus as a whole, we 
are faced operationally with showing that he 
uses a stimulus as a whole, which requires 
capacities, such as memory, that may not be 
sufficiently developed in the neonate. If we 
are willing to accept the neonate's selective 
attention to form characteristics such as con- 
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tours and angles and his first visual prefer- 
ences as indicative of the earliest perception 
of form, and if we are willing to accept the 
contention that later perceptual responses do 
not differ from those in any essential way, 
then we can say that the neonate does per- 
ceive form. There seems to be nothing in the 
data to date that makes it necessary to 
abandon this latter position. 

The data indicate that the early responses 
of the infant can be accounted for largely 
by his attraction to relatively simple charac- 
teristics of stimulation, such as contours, 
angles, and stimulus change. This contention 
is based primarily on data showing visual 
scanning patterns, and responses to stimuli 
with differential change and contour. At least 
part of the explanation for the selection of 
this kind of information may be found in the 
presence of neural coding mechanisms opti- 
mally responsive to certain types of stimula- 
tion. Whether responses to stimuli of this 
nature represent perception of configurations 
as the gestaltists would require, or whether 
they should be viewed as responses to isolated 
elements, seems in this case to be more a 
matter of the way in which one views 
stimulation than a matter of the capacities of 
the infant. For example, for many figures, 
whether one views a contour as a figure 
against a ground or as an abrupt change in 
brightness, there is no differential reaction 
that would be expected from the subject. This 
sort of definition is nonoperational. 

Even within the general stimulus character- 
istics to which the neonate responds, there 
seems to be some selective tuning, based on 
the kind of information-gathering activity 
that the newborn exercises with the most 
facility, For example, the neonate selects ver- 
tical contours for active scanning, based par- 
tially, it seems, on his relatively good ability 
to scan horizontally, Presumably, as his fa- 
cility to scan in different directions matures, 
he should as readily select horizontal contours. 
This emphasis on the importance of develop- 
ing good information-gathering techniques is 


stated most explicitly by the Russian motor 
Copy theorists, 


That the newborn 
Once thought of as 
Seen in several inst 


responds to information 
relatively sophisticated is 
ances. The mere fact that 
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he prefers to look at patterns is an advance 
over the former view that the infant is most 
attracted by colors. That infants under 2 
months discriminate flat from solid objects 
shows that at least some of the information 
in three-dimensionality is picked up at a very 
young age, Bower’s data suggest further that 
such information may be primary for the per- 
formance of certain kinds of discriminations. 
These findings with three-dimensional stimuli 
are most in line with the viewpoints of Gibson 
and of the gestaltists. Whether or not the 
young infant responds to the organizational 
features posited by the gestaltists is not clear, 
due to the contradictory nature of the find- 
ings. There is some evidence, though, that the 
young infant perceptually “fills in” a 
plete stimulus figures, and thus behaves a 
this respect in the manner which gesta 
psychology would predict. 1 

There are a number of developmenta 
changes that have been shown to occur, " 
some which, while they looked promising 4 
one time, have yielded mixed findings. In bw 
latter category are the findings on complex 
and the shift in preferences from linear 
circular stimuli at about 2 months of p e 
developmental change that has been Lage 
well documented is the upward shift in t 
amount of contour preferred in a figure. 
line with this preference is the data show 
that infants at a certain age will up 
more rapidly to patterns with relatively Ji n 
contour than to patterns with more pac 
This finding is a good example S one 
supporting the contention that the tion 
is capable of handling more informa 
with age. nore 

That the infant may also respond to r uld 
subtle stimulus variables, as Gibson wo 
emphasize, is suggested by the findi 
discrimination of arrangement of i 
Although one investigator (Fantz. p 
showed discrimination of this stimulus fea he 
by subjects 2-3 months of age, most ir 
findings have been negative with respe cial 
arrangements of geometric stimuli 0! some 
features until 3-4 months of age. W ex the 
of the negative findings may be due othe! 
techniques used, the infant may on. jen tion? 
hand be unable to make such discrimin? ^ re 
for the first few months. Response t 


ing 
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subtle stimulus variables with age is seen 
also in the eventual ability of the infant to 
discriminate facial expressions. 

Several studies document the finding that 
the neonate seems to be "captured by the 
stimuli” while the older infant seems to be 
capturing the stimuli by his visual behavior." 
In line with this observation are the data 
Suggesting that habituation to stimuli becomes 
more rapid with age, and the difficulty that 
has sometimes been encountered in obtaining 
habituation in neonates. However, even the 
neonate appears to be somewhat in control 
of his input, selecting only certain features of 
à stimulus for inspection and passing over 
Others, That the infant acquires greater control 
as he matures is undoubtedly the case. Some 
of this control is provided by developments in 
Motor capacities, but this is not the whole 
explanation, Another aspect probably has to 
do with the development, through maturation 
and learning, of central processes such as 
Increased capacities to store and retrieve in- 
formation. That the infant becomes less 
captured" by stimuli as he matures is in 
line with his developing capacity to handle 
more information. The more readily he can 
turn his gaze from one stimulus to another, 
the greater his potential for gathering new in- 
formation. And, concomitantly, the more 
Information he picks up, the more he exercises 
his abilities for selecting and rejecting input. 

The selective attention of the infant serves 
the purpose of directing his gaze toward cer- 
tain kinds of stimulation in the environment. 
Ata very early age, the basic neural structure 
of the organism may take primary respon- 
Sibility for this direction. Gradually, however, 
the effects of experience grow in influence, 
and stimuli become recognizable and mean- 
ingful. The selective attention of the infant 
ecomes tempered by the kinds of stimuli to 
which he has been exposed (and to which he 
has exposed himself). That certain frequently 
encountered stimuli may become meaningful 
at a very early age is suggested by the study 
of Carpenter et al. (1970) in which presenta- 
tion of the mother's face elicited behavior 
Suggestive of past associations in subjects as 
Young as 2 weeks, Kagan and his associates 
lave been successful in showing certain rela- 
tionships between inferred representations of 
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past experience and present responding. They 
have shown, for example, that the infant is 
maximally responsive to new stimuli that are 
different, but not too different, from those 
experienced repeatedly in the past. Such 
findings characterize the information-seeking 
behavior of the infant as an active, orderly 
process not only in relation to built-in coding 
mechanisms, but also in relation to the 
particular history of the organism. 
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Theories of direct visual perception which claim that all the information for 


perception resides in the structure of 


the ambient optic array are only partial 


theories of perception. They run counter to evidence in neuropyschology and 
experimental psychology which suggests that perception may be a process from 


the inside out as well as from the o 
any moment in time, measured by fe: 


utside in. The state of the orgar 
edback írom the central nervous 


itself, determines both what part of the structure of the optic array is relevant 


This article first reviews Gibson's (1959, 
1966, 1968, in press) theory of direct visual 
perception. The review suggests that some of 
the assumptions made by the theory have 
not been adequately tested. 

According to Gibson, there is permanent 
stimulus information in the ambient optic 
array. Gibson (in press) said: “The meanings 
of the edge, of a falling-off place . . . , are 
given in the optic array." More specifically 
(Gibson, 1959), the effective stimulus for 
perception must be sought in a textured 
optic array, supplemented by the transforma- 
tions relating a simultaneous pair of them 
and by transformations relating sequences 
of momentary arrays. According to Gibson, 
it is this structure inherent in the optic 
array that organizes visual perception in 
conjunction with a brain that resonates to 
this ready-made structure (Gibson, 1966). 
There is no further need for perception to 
be mediated by an internal model, let alone 
Constructed by the organism. Nor are there in- 
ternal processes correcting or interpreting this 
information, This is direct 


A visual perception. 
To be sure, Gibson ( 


1966) recognized that 
optic array information does not exhaust the 
information received by the organism. It also 
receives input about its own movements, for 
example, Here again Gibson generally argued 
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at a given moment and how it will be interpreted. 


that the relevant input, whether propriocep 
tive or exteroceptive, is already struct 
In addition to being structured, exterocep T 
and proprioceptive information can in ror 
tant ways substitute for one another. ve- 
example, Gibson (1966) argued that mO 

ment on the part of the individual can 
registered visually (through motions o Ji 
ambient optic array) as well as directly Sh 
proprioceptors. Notice that the point W tion 
is made here is not restricted to the p 
of visual proprioception, which is that felt. 
movements are usually seen as well we are 
Conversely, motions of the joints Gp 
registered by proprioceptors can be we 

both by externally induced movements. self. 
movements initiated by the individual hin pat 
In general, it seems fair to conclude | 
Gibson assumed that ambient optic wet 
formation will be sufficient for the p i 
not only of events happening in the d 
environment but for the perception 


the 


organism's movements as well. om] om 

On the matter of whether informatio y ted 
the ambient optic array has to be inte T pout 
by the brain, consider what Gibson ue nt 
the perception of motion in the envi"? con, 
and movement by the observer the no 
1968). For Gibson, the stimulus for puc 
tion of an object is a lawful J^ 2 gut 
relations at the eye having to do be qe 
in an array transforming itself p phe 
sion effects. Gibson implied that pon " 
optic stimulus information for wc S 
object in the environment. On the ot nel 


ver, 
r mo wi 


E & : " ion fo 
the optic stimulus informatior tion pis 
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of the observer is a Grano 
occlusion effect of a total optic ? 
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information need not be interpreted by the 
brain. Gibson (1966) was very explicit on this 
point when he discussed the difference be- 
tween his theory of motion and movement 
and that by von Holst (1954) and von Holst 
and Mittelstaedt (1950). Because of the 
crucial importance oí this issue, Gibson's 
Statement is quoted in full: 


One theory suggests that whenever the brain sends 

Out a command for a certain movement it stores 
à copy. When the input of any receptor reaches the 
brain, it is automatically compared with the current 
Stored copy. If it matches, the input is taken to be 
a case of proprioception—a feedback. If it does not 
match, the input is taken to be a case of extero- 
ception—a feed-in, In this theory, the input does not 
itself specify its cause; the cause must be deduced 
(von Holst & Mittelstaedt, 1950). 
_ An alternative theory assumes that the neural 
input caused by self-produced action is simply dif- 
ferent from the neural input caused by an intruding 
stimulus, The two kinds of input are different in 
their sequential properties; they are different kinds 
of transformation or change, and the simultaneous 
Pattern of nerve fibers might be widely dispersed. 
In the long run, this second hypothesis may prove 
to be the simpler of the two, for it does not pre- 
Suppose a brain that copies, stores, compares, 
matches, and decides [italics added] [Gibson, 1966, 
p. 39]. 


This second hypothesis, of course, represents 
Gibson’s own theory of the perception of 
Motion and movement. 

Consider finally an added important feature 
Of Gibson's theory. Gibson stressed that am- 
bient optic array information is not available 
to a passive organism but, rather, that it must 
Benerally be obtained by active movements 
(of the eyes, head, limbs, or body) of the 
Perceiver, that it must be attended to, 
Searched out, etc. Gibson (in press) said: 

Perception is not supposed to occur in the 
rain but to arise in the retino-neuromuscular 
System as an activity of the system." He 
also argued (Gibson, 1966) against the 
Conception of the one-way visual system and 
Stressed the circular character of the per- 
Ceptual process in which retinal inputs lead 
to, say, ocular adjustments and, in turn, to 
altered retina] inputs, etc. Further, he ex- 
Dlored—to an extent—the properties of such 
a circular system. Thus, for example, he 
Stated that successive inputs under successive 
Scans comprise a mathematical group, that 
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is, that the same given structure persists 
throughout the series (Gibson, 1966). At this 
point, Gibson seemed to say that the outer 
environment as such does not have properties 
of a mathematical group, but that such group 
properties become available as the ambient 
optic array is being transjormed by the per- 
ceiver’s movements. That is, the organism's 
own actions help to construct and thereby to 
define the stimulus. However, Gibson appar- 
ently did not accept this seemingly necessary 
conclusion, for later in the study (Gibson, 
1966) he stated: “The available stimulus 
surrounding the individual has structure and 
this structure depends on sources in the outer 
environment [italics added: p. 267].” 

The overall impression left by the theory 
of direct visual perception is that in spite of 
the seemingly important statements about 
motor mechanisms in perception, the theory 
is essentially concerned with the ambient 
optic array. How the structure of the optic 
array is produced and by whom is, in the 
final analysis, not considered to be germane 
to this theory of perception. But what if in 
some way the organism were to have informa- 
tion about its own activity, that is, about its 
own contribution to the construction of the 
stimulus, quite independently from action- 
produced peripheral feedbacks? Would it in 
that case not be limiting not to include this 
added source of information into one’s theory 
of perception? The consideration of this kind 
of possibility, however, runs counter to 
Gibson’s assumptions stated earlier that in- 
formation about the organism’s actions comes 
as a result of peripheral feedback from such 
actions, that information about the activity 
of the organism is really interchangeable with 
optic array information, and that exterocep- 
tion is a sufficient source of information. He 
did not discuss the possibility that the orga- 
nism might get information about its actions 
or about its own plans for action at a central 
level of the nervous system, in addition to the 
information it receives about action-produced 
peripheral feedback. Tf such a central process 
were to operate, the organism’s information 
about its own activity could be considerably 
more independent of peripheral inflow and 
notably of exteroception than Gibson as- 
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sumed. Information would be available about 
the events produced in the central nervous 
system (Konorski, 1967). This is infor- 
mation produced centrally about active move- 
ments and not about proprioceptive or 
exteroceptive events, both of which can be 
controlled externally, even though propriocep- 
tion may of course also result from self- 
initiated action. Under these conditions, feed- 
back from the central nervous system itself 
about the organism's own activity might have 
a direct effect om perception, possibly in the 
manner implied by von Holst (1954)—that 
is, this kind of information could be used 
by a brain that "copies, stores, compares, 
matches and decides," to use Gibson's (1966) 
phrase quoted earlier. 

In order to explore whether, as Gibson 
claimed, optic array information is sufficient 
for predicting a given perceptual process or 
whether nonoptic variables having to do with 
the organism’s voluntary activity interact 
with optic information to produce a percept, 
a type of research is needed that has not 
been considered by Gibson. It is necessary 
to study whether, with the ambient optic 
array constant, other nonoptic events produce 
specifiably different perceptual effects. More 
generally, it must be explored whether vary- 
ing either optic or nonoptic parameters, 
singly or in combination, results in a different 
perceptual effect in each case. Several studies 
that have done this are mentioned in the 
following section. The argument is made that 
certain of the findings produced by this re- 
search would seem to require a modification 
of Gibson’s theory of direct visual perception 
in line with processes that have been alluded 
to above. 


EXPERIMENTAL FINDINGS ON THE INTER- 
RELATION IN PERCEPTION BETWEEN OPTIC 
Events AND Nonoptic Events HAvING 
TO Do WITH THE OrGANISM’s 
Own VOLUNTARY ACTION 


The work of von Holst (1954) and von 
Holst and Mittelstaedt (1950) on the per- 
Ception of motion and movement may first 
be mentioned as an example of an ap- 
Proach to the study of perception that ex- 
Plores the relation between optic and nonoptic 
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variables in perception and thereby investi- 
gates the role played by these variables 
in the perceptual process. Recall, first, that 
for Gibson (1968) the information for the 
perception of the motion of an object is a 
lawful set of optic relations at the eye having 
to do with a figure in an array being trans- 
formed with occlusion effects. Gibson implied 
that this is the optic information for motion 
on the part of an object in the environment. 
On the other hand, the information for move- 
ment of the observer is a transformation with 
occlusion effects of a total optic array. At 
first glance, none of this seems debatable; 
given the normal everyday environment of 
the mature perceiver. However, the study by 
von Holst and Mittelstaedt (1950) with a 
fly is instructive because it shows that the 
same optic input can connote at one time 
movement of the observer and at another time 
motion by the environment. Of concern are 
a series of three experiments in which, under 
conditions in which afferent input from the 
optic array remains the same, entirely differ- 
ent perceptual interpretations of the afferent 
signals were made by the fly. If a fly is put 
in the center of a black and white striate’ 
cylinder which is rotating, say, to the rig? 
the fly follows the movement of the cylinder 
by also rotating to the right. (If the cylinde! 
is rotated to the left, tracking takes place to 
the left.) Von Holst and Mittelstaedt DTO' 
posed that the fly's eye contains spec i 
neurons that control the observed optomoto 
reflex. This conclusion was derived from t? 
additional finding that if the head of the ioi 
is surgically rotated 180 degrees in sav 
to the body, so that left and right sides id 
the eyes are reversed, and hence the vist 
signals are also reversed, the fly turns oppos! j 
to the direction of the rotation of the cylin s 
In a second experiment, involving à norn i 
fly, a stimulus such as smell was place he 
the left of the fly, The fly now moves tO 

left in the direction of the new stimulus pe 
the cylinder remains stationary, and thus ki 
duces very nearly the same visual pem 
inputs into its eye as in the first pn 
when the cylinder moved to the right er 
the fly remained stationary—the only n 
ence being that these inputs are now me 
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produced. Under this condition, the fly does 
not show the oculomotor reflex that it showed 
in Experiment 1. That is, the same afferent 
input evidently no longer signifies a motion 
to the right by the environment and does not 
elicit a tracking motion in the appropriate 
direction. The fly merely initiates left move- 
ment and stops when it reaches the source of 
the smell. Von Holst and Mittelstaedt (1950) 
interpreted this result to mean that in this 
Second experiment efferent information is 
Somehow processed by the system and that it 
Contributes to the interpretation which is 
placed on afferent input. In line with reflex 
theory, it could be claimed that, by loco- 
moting, the optomotor reflex observed in the 
first experiment is simply blocked. To refute 
this possibility, a third experiment was per- 
formed that was identical to the second, ex- 
Cept that the head was surgically rotated 180 
degrees, In this case, if motion to the right 
9n the part of the animal is started by a smell 
to the right of the fly, the fly—experiencing 
the same afferent input as in Experiment 1 
and clearly the same as in Experiment 2— 
will continue to move to the right in small 
Circles until exhausted. This, however, hap- 
Pens only if the environment is textured. In 
an optically homogeneous environment, the 
animal moves normally and stops at the loca- 
tion of the smell. These findings suggest that 
the optomotor reflex is not blocked during 
Motion by the animal but, rather, that the 
ehavior observed in both Experiments 2 and 
» and presumably in Experiment 1, is a 
function of reafference and its relation to the 
activity or nonactivity of the system. 

Von Holst (1954) explained the results 
9f the above experiments by postulating a 
uilt-in summation or comparison between 
Monitored efferent and afferent signals. Ac- 
Cording to von Holst, if the normal animal 
Moves to the left in a stationary environment, 
an efferent copy of left movement as well as 
the reafferent signals associated with such 
Movement (ie, a movement of the visual 
field from tight to left across the retina) are 
Compared in the central nervous system. In 
this case, these two types of signals, accord- 
8 to a convention assumed by von Holst, 
are of Opposite sign and can cancel each 


other. This means that this particular re- 
afferent input will not signify to the animal 
changes taking place in the environment it- 
self, and it will not lead to optomotor action 
by the animal of tracking (Experiment 2). 
If the same afferent optic input is not accom- 
panied by an efferent copy—as when the en- 
vironment moves and the fly is stationary— 
the meaning to the animal of the afferent 
input is changed, and this case leads to track- 
ing behavior to the right to follow the motion 
of the environment (Experiment 1). If there 
is efference as well as reafference, but if the 
latter is reversed, these two types of signals 
have the same sign, summate, and produce 
the continued circling motion of the animal 
(Experiment 3). What von Holst is thus say- 
ing is that the organism records relations 
between itself and the environment. 

A set of findings similar to those of von 
Holst was reported by Sperry (1950). Work- 
ing with fish, Sperry found strong circling 
tendencies to result from surgically rotating 
an eye 180 degrees. In connection with these 
findings, Sperry (1950) did a series of brain 
ablation and extirpation of the vestibular sys- 
tem studies. On the basis of these studies 
alone, Sperry could not conclude in favor of 
a theory which asserts that the relation be- 
tween optic and extraoptic factors is crucial. 
In fact, the findings that ablation of the 
optic tectum does interfere and that bilateral 
labyrinthectomy does not interfere with the 
optokinetic response decidedly rules in the 
possibility of a purely optic explanation of 
the data. A labyrinthectomy—on the hypothe- 
sis of extraretinal kinetic factors—should 
have led to an interference with a circling 
reponse. However, the additional finding, as 
in the von Holst and Mittelstaedt (1950) 
study—that exactly the same pattern of 
excitations from the retina may, in one in- 
stance, arouse a circling movement and not 
in another, depending entirely upon the direc- 
tion of the animal’s movement accompanying 
the retinal input—made a purely optic hy- 
pothesis unlikely to Sperry. The movement 
itself had to be brought in as a necessary 
determiner of the perceptual process. This 
argument, coupled with the findings in regard 
to the negative effect of extirpation of the 
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vestibular system on circling motion, led 
Sperry to formulate the idea that the kinetic 
component does not arise peripherally but 
centrally, as part of the organism's efferent 
commands which elicit overt movement. He 
thus proposed what may be considered to be 
the equivalent of von Holst's “efferent copy” 
theory—namely: 


any excitation pattern that normally results in a 
movement that will cause a displacement of the 
visual image of the retina may have a corollary 
discharge into the visual centers 
for the retinal displacement [p. 488]. 


to compensate 


That the notions of corollary discharge and 
efferent copy are generally interpreted to be 
the same becomes evident from an exami- 
nation of the literature (e.g, Paillad & 
Brouchon, 1968; Rock, 1966; Teuber, 1960). 

Whether the theory proposed by von Holst 
and by Sperry to explain their data is valid 
may be problematical. More is said about this 
in a later section. What needs to be pointed 
out here is that the results of their experi- 
ments contradict any hypothesis that cate- 
gorically claims that ambient optic array in- 
formation is sufficient to produce the percep- 
tion of motion and movement and need never 
be interpreted by being related to other events 
arising in the organism itself. By showing 
that percepts can vary while the information 
in the ambient optic array remains constant, 
Gibson's claim would seem to have been 
refuted. An argument which might still 
salvage this claim, namely, that differences in 
attentional processes might account for the 
differences in perception, would not seem 
valid in view of the fact that the animal's 
behavior in all experimental conditions shows 
that they were clearly attending to the 
Stimulus. A final argument that Gibson used 
Was that von Holst and Sperry were con- 
cerned with retinal sensations which must be 
Corrected by the brain. Gibson always made 
it clear that sensations are irrelevant to his 
theory. Thus, it might be claimed that von 
Holst and Sperry used data that might be 
Considered irrelevant to perception. Clearly, 
however, the “sensations” with which these 
authors are concerned are the result of the 
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optic array upon which Gibson based his 
own theory. 

The studies cited thus far have dealt with 
situations in which reafference is associated 
with head movement or with locomotion. 
There are some suggestive studies in the 
area of eye movement as well, reported by 
von Graefe (1878), von Helmholtz (1925), 
and von Holst (1954). Von Helmholtz 
(1925) observation deals with the situation 
in which the external rectus muscle of the 
eye is paralyzed such that the eye cannot 
move in a given direction. It is found that 
in a situation in which, say, the right eye 
cannot move to the right, a command to the 
subject to move his eye to the right will pro- 
duce a perception on his part of seeing the 
environment moving to the right. Again, then, 
here is a case in which there is perceptua 
change without any actual change in the input 
from the optic array. é 

The von Helmholtz data, while they pute 
taken to support a position which said tha 
an extraoptic event—namely. the feeling te 
innervation—enters into perception, mit 
torically become subject to two interpre a 
tions. One is the interpretation by e 
Helmholtz, which said that there is canha 
information that affects the perceptual pro 
ess. The other interpretation, reviewec $ 
Festinger and Canon (1965), was proposed i 
James (1950). James asserted that oe i 
is well known that both eyes act in gon a 
motorically, and since only one eye was olt 
mobiled in the case of the von Helmh 7 
experiment, the perceptual informatio” va 
motion could have come, not from a Ce! the 
source to the immobilized eye, but iom i 
proprioceptive information due to the no 
eye which was in motion during the ger 5 
act. At the time James made his point, 1 ne 
generally assumed that, indeed, per!P the 
stimulation could therefore account gor the 
results obtained by von Helmholtz. O" Jat 
face of it, therefore, the von Helmholtz oo 
need not be taken to contradict Gir 
theory. However, later developments its 
changed the validity of James’ argum 
which are now reviewed. estio” 

Whitteridge (1962) raised the q int? 
“whether the position of the eyes enter? 
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judgments of position and movement, and if 
it does, how far proprioceptors are respon- 
sible [p. 511].” A direct test of the question 
whether there is useful proprioceptive infor- 
mation from the eye’s extraocular muscles, 
and therefore a test of the respective theories 
of von Holst and James, was made by 
Brindley and Merton (1960). These authors 
anesthetized the surface of the eyes as well 
as the inner surface of the eyelids. Moreover, 
they covered the corneas of the eyes with 
Opaque caps so that no visual information 
was available to the subjects. They then 
mechanically moved the eyes singly as well 
as in concert and found the subjects were 
unaware that any such movement had taken 
place. They concluded that no information 
about the position of the eye is derivable 
from the sense endings of eye muscles. 

_ A study by Festinger and Canon (1965), 
in turn, was designed explicitly to ascertain 
whether information obtained from a record 
of certain outgoing motor nerve impulses is 
available and can be used by the organism. 
The hypothesis was tested that the organism 
knows the direction of the eye from knowing 
where it has been directed to go. Experimen- 
tally, this means that conditions had to be 
Created in which, in one condition, the eye 
became directed at a certain point in space 
because of a specific efferent command and, 
in another condition, without such a com- 
Mand. To accomplish this, Festinger and 
Canon (1965) availed themselves of findings 
by Rashbass (1961) on the differences be- 
tween saccadic and smooth tracking eye 
movements, The former's function, according 
to Rashbass, seems to be to move the eye 
to a specific location; the function of the 
latter is to match velocities of a moving 
target regardless of the target’s position. The 
experiment thus involved having subjects 
fixate on a suddenly appearing target versus 
having them track a slowly moving target to 
a fixed position in space. In both experimental 
Conditions, subjects had to point out the 
location in space of the target by using a 
Pointer operated by their index finger. The 
efferent instructions are different in these two 
Conditions, while the proprioceptive informa- 
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same. The major findings are that the accu- 
racy of pointing at the location of the target 
is very significantly better in the condition in 
which the target suddenly appears than in the 
condition where the target slowly moves 
toward its ultimate destination. This finding 
supports Festinger and Canon's theory that 
efferent, that is, outflow, information is both 
available and useful for the subject in making 
perceptual judgments about radial direction. 
An alternative hvpothesis, that subjects in 
the “saccadic condition" were attending to 
position cues—and thereby obtained extero- 
ceptive information—whereas the subjects in 
the smooth tracking eye-movement condition 


did not attend to these cues, is eliminated . 


by the fact that there were no such cues, 
since the experiments were performed in 
the dark. 

The upshot of all of this is that the data 
reported by von Helmholtz do not seem ex- 
plainable in terms of purely peripheral excita- 
tion and that other mechanisms such as those 
proposed by von Helmholtz or by von Holst 
should at least be considered. A von Holst 
type explanation would be as follows: In a 
stationary environment, efference to the eye 
to move to the right is normally accompanied 
by a movement of the retinal image from 
right to left. Since, in fact, no such move- 
ment occurred in Helmholtz’ experiment, the 
organism perceives a compensatory movement 
by the environment from left to right. 

"The study by Helmholtz has been elabo- 
rated further by von Holst (1954). In addi- 
tion to performing the experiment reported 
by Helmholtz, von Holst carried out two 
additional and related studies to further test 
his theory. In one experiment, instead of com- 
manding the subject to move his eye, he 
turned the subject's paralayzed eye mechani- 
cally to the right. He argued that in this 
case there is no efference nor is there an 
efference copy. There is, however, afference 
which now, unmatchd by an efferent copy, 
is transmitted to a central integrating mecha- 
nism, according to the theory. The theory 
would predict that a perception of movement 
of the environment to the left will ensue, 
and it does. In a final study, the paralyzed 
eye is commanded to move and is also moved 
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mechanically—thereby in effect simulating 
the voluntary movement of a normal eye, in 
which case there is both efference and reaí- 
ference. In this case, the Helmholtz effect and 
the effect of von Holst's second study cancel 
each other, and the environment is perceived 
as stationary. By means of this triplet of 
studies, one of which is a replication of the 
experiment by von Helmholtz, von Holst has 
thus demonstrated nicely the function of the 
relation between optic and extraoptic vari- 
ables in perception. He has shown that the 
perception of no motion by the environment 
is based on an organism-environment rela- 
tion, and his theory also explains the illusion 
of motion and, more precisely, the specific 
form that illusion takes. 

Still further studies by von Holst (1954) 
are also relevant. These deal with the system 
of visual accommodation rather than explora- 
tory movements. If the circular muscle which 
allows the lens to round up is narcotized, 
vision will be accommodated for distant ob- 
jects. Any intention for near accommodation 
will start a motor impulse which cannot be 
nullified by any change in reafference, and, 
hence, as in the previous cases with the von 
Helmholtz effect, certain predictable percep- 
tions should ensue, That is, projections will 
remain the same size on the retina in spite 
of accommodation commands, and it has long 
been known that under these conditions all 
objects in the visual field come to be seen 
as small. Von Holst argued that a similar set 
of phenomena can be produced in the case of 
a normal eye via the process of afterimages. 
If an afterimage is produced of a distant 
object which is then projected on a near 
surface, accommodation is again accompanied 
by à retinal projection which remains constant 
in size. Under these conditions, the object 
appears very small on the near surface. Von 
Holst argued that these false perceptions 
appear although the peripheral stimulus situ- 
ation is unaltered. If, now, the peripheral 
stimulus situation does become altered and 
E accommodation remains unaltered—as 
= ee looks first at a small and 
Ds m. mE cross at the same distance— 
Eun .ATEe cross looks larger. If, again, 

uations of these two experiments are 
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combined, and the eye looks at a given ob- 
ject being moved nearer to it, the object is 
perceived as being constant in size. k 

Whether this particular set of three studies 
is necessarily in support of an efferent- 
reafferent theory, and not in support of 
Gibson’s theory, is perhaps arguable. That 15; 
it may well be that optic, in addition to 
efferent, cues were present by which the 
subject could have measured distance. 

Von Graefe (1878) reported cases of pa- 
ralysis of the rectus muscle of the eye ™ 
which the subject can move his eye only er 
an angle of, say, 20 degrees to the right. A 
such a subject is asked to look at an objec 
which is at this given angle of 20 degrees r 
his straight ahead, the effort to do so is muc 
greater than it would be normally. The pré 
diction made by a Sperry or von Holst s 
of theory would be that this circumstance 
should affect the subject's perception of adio 
direction, Such is indeed the case. Under € 
conditions, subjects will judge an object i 
20 degrees to the right of straight ahead zl 
be located at an angle much greater e 
that. In fact, the angle indicated by sue 
is approximately that which a normal ; 
would have traversed had it been moved t9 
extreme right position. 

Further studies showing that inform an 
about the voluntary motion of the eye ^. 
its relation to reafferent input is importan the 
perception have been concerned with For 
problem of the stabilized retinal image E 
example, Festinger, Burnham, n 
Bamber (1967) discussed a novel s he 
looking at the situation produced | bur? 
stopped or stabilized retinal image (Ditc t 
& Ginsborg, 1952). What happens a 
conditions, according to Festinger, 15 e and 
ordinary relationship between Le et^ 
the response-produced stimulation O" ;onshiP 
ence, which in this case is the relat! ulta 
between motion of the eye and a de 
change of position of retinal input d ich 
stroyed. The effect of this treatments pas 
is temporary fading of the retinal pict” peut! 
traditionally been interpreted a$ " in 
"fatigue? or adaptation (Ditchburn pA tio? 
borg, 1952). The assumption that per abot 
involves centrally available informatio” 
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the organism's voluntary activity, however, 
allows consideration of a competing hypothe- 
Sis. This hypothesis says that fading results, 
at least in part, because of the disruption of 
the efferent-reafferent relationship that ob- 
tains when eye movements no longer produce 
Specific transformations of retinal inputs. 
Festinger proposed that if the stabilized image 
could be moved in a manner which is un- 
related to any movement by the eye—say, by 
having an experimenter do the moving—the 
competing hypotheses should be testable. If 
fading effects continue under the latter treat- 
ment, the reafferent hypothesis would be 
strengthened since the neural adaptation ef- 
lect should be minimal under these condi- 
tions. Observations reported by Campbell and 
Robson (1961), using the shadows of retinal 
Capillaries which are moved across the retina 
at certain amplitudes and frequencies, seem 
lo support this hypothesis. That is, the per- 
sistent fading of the capillary shadows re- 
Ported by Campbell and Robson (1961) can- 
not be explained entirely in terms of fatigue 
Or satiation of neural mechanisms. Festinger 
et al. (1967) also mentioned a study by Tepas 
(1962) in which complete “blank.out,” that 
!5, a complete disappearance of the sense of 
vision on the “ganzfeld,” was reported by sub- 
Jects at the same time when they also mani- 
fested an absence of saccadic eye movements. 

The notion that centrally available infor- 
Mation about voluntary activity is necessary 
in perception receives further support from a 
Series of studies on perceptual learning. For 
example, studies by Festinger et al. (1967) 
are a direct test of the hypothesis that cen- 
trally available information about its own 
activity, about efference, is essential to the 
Organism's visual adaptation to distortions 
Produced by prisms. The major idea here is 
that if perception involves the building up of 
Cfierent—reafferent correlations and if distort- 
ing prisms require a systematic shift in such a 
Correlation, adaptation to prism-induced dis- 
lortions in contour should be facilitated by 
the Presence of central efferent participation 
5 the central nervous system in the adapta- 
‘on process In the experiment, prisms were 
used to induce curvature to objectively straight 
contours and straightness to objectively 


curved contours. The contours in most cases 
were parallel metal bars. In some experi- 
ments, during adaptation training a stylus 
had to be moved between the bars. In another 
study, the eye, now itself fitted with a contact 
lens on which a prism was mounted, had to 
focus along the length of the contours. In an 
additional experiment, a different experimen- 
tal task (a “shooting gallery") was used, and 
arm movement had to be used to aim a light 
gun. In all experiments, the two major varia- 
tions for comparison were (a) to move a 
stylus, shoot the “gun,” or move an eye as 
a result of contour-specific efferent commands 
to move along specific paths in the field (e.g., 
to learn a smooth, fast, sweeping motion of 
the stylus between the apparently curved par- 
allel rods); (b) to move without making 
contour-specific efferent commands regarding 
direction (e.g., to have the eye follow a target 
moving between apparently curved parallel 
rods, to learn a smooth stroking motion which 
must at all times maintain contact with one 
rod, etc.). The first variation, according to 
the authors, constituted an efferent or 
“learning” condition, while the second was 
designed as a nonefferent or “accuracy” 
condition. 

The major findings are that the “efferent” 
conditions do lead to significantly greater 
adaptation. (Adaptation being defined as the 
extent to which “apparently curved” but ob- 
jectively straight is judged as less curved 
after the experiment and, similarly, the extent 
to which “apparently straight” but objec- 
tively curved is judged as less straight.) This 
is measured by ascertaining changes in the 
setting of the metal rods to straight by the 
subjects wearing both prisms and plain glasses 
in pre- and posttests. In several of the experi- 
ments, the efferent processes induced (i.e., 
moving a stylus with the arm and hand) are 
not as directly relevant to the final measure 
of visual adaptation to distorted contours as 
is an efferent process in which eye movement 
itself is involved. It is interesting to find, 
therefore, that Festinger et al. (1967) showed 
that one experiment which involved moving 
the eye along the contours, rather than the 
stylus, produced the highest (approximately 
40%) adaptation. This work has since been 
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replicated and extended by Burnham (1968), 
Gyr and Willey (1970), and Slotnick (1969). 

A suggestive set of evidence in regard to 
the hypothesis that central processes having 
to do with the organism's voluntary activity 
are necessary to perception has also been 
supplied by the studies on the relation be- 
tween perception and voluntary motion con- 
tributed by Held (1964), Held and Bossom 
(1961), Held and Hein (1963), and Hein and 
Held (1962). Held (1964) argued strongly 
against the assumption that 


the neurological processing of sensory input is inde- 
pendent of the organization of motor action, that is, 
against the assumption that the analysis of sensory 
input preparatory to motor acts occurs solely in the 
course of a chain of neural events traversing the 
sensory projections but completed prior to impinge- 
ment upon motor centers in the brain [Held, 1964, 
p. 141]. 


In studies of perceptual displacement, Held 
and Bossom (1961) have shown that volun- 
tary motion (as opposed to passively being 
moved about but ostensibly receiving the same 
visual inputs), with its concurrent train of 
sensory and motor feedback, provides the es- 
sential order required for compensation in 
such an environment. That is to say, spatial 
orientation in such an environment was only 
made if voluntary motion was allowed. 
Further evidence quoted from Held (1964) 
stated: 


Hein and Held have reared kittens with one eve 
open during locomotion in an illuminated surround- 
ing; the other eye was open during passive transport 
over an equivalent path. After several months of 
such exposure, stimulation of the eye that had been 
open during active movement produced normal 
Visually-guided behavior but the other was func- 
tionally blind. These experiments clearly implicate 
the motor system in processes traditionally regarded 
as sensory [pp. 308-309]. i 


Held and Hein (1963) found that self- 
produced movement is necessary for the de- 
velopment of visually guided behavior such 
as On the “visual cliff" (Walk & Gibson, 
1961). Only those cats which had been allowed 
voluntary motion while given controlled in- 
T patterned vision during training 
S id behavioral evidence of depth dis- 
aiden on when put on the “visual cliff,” or 

ce of visually guided paw placement. In 
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bs | 
the test involving visually guided paw place- 
ment, the subjects body was held in the 
experimenter's hands so that its head and 
forelegs were free. It was then slowly carried 
forward and downward toward the edge of a 
table. A normally reared animal shows visu- 
ally mediated anticipation of contact by ex- 
tending its paws as it approaches the edge. 
Held and Hein (1963) stated that peripheral 
atrophy resulting from lack of use of various 
organs is contraindicated by the presence of 


pupillary and pursuit reflexes and the rapid 
recovery of function of the passive subjects 
once given their freedom. Debility specific. to 
the motor system can be ruled out, according 
to the authors, because the passive subjects 
showed the same tactual placing responses 
and other motor activities as the normals. 
Singer and Day (1966a, 1966b) reporte 
findings which, they claim, contradict some 
of the results obtained by Held and his sili 
laborators. The data of Singer and Day pd 
that adaptation to prism-produced distortio! 
of localization occurs under both passive an 
active movement conditions. However, m 
things are worth pointing out about wm 
studies by,Singer and Day: (a) The pass” 
condition instituted by Singer and nt 
(1966a) does not involve a passive ien 
of the subject’s arm by the experimen 
during training. Rather, the subject ae 
moves his passive arm with his own free ud 
by turning a mechanism designed for the Dose 
pose of moving the passive arm. Under p A 
conditions, it is not at all unlikely that atm 
vant central interaction between the tW? seem 
may have occurred. It would at least 5 


limind 


worthwhile for future research to € espe? 


this potentially confounding condition: " ad 


singer 2! 
e SinE& ont 


man, 1963; Held & Gottlieb, 1958; A 
tw' 
further comment to be made di 

Singer and Day (19662, 1966b) StuG? ec 
viewed here is that it is clea Jap 5 
tion of their tables that, for both à dition? 
and aftereffect, all experimental E | 
that involve some active movement T 


ee 


a 
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& greater adaptation effect—thought not sta- 
tistically significant—than the condition in 
Which there was no active movement. This 
fact would seem to point to the possibility 
that the reported divergence of their data 
from data reported elsewhere in the literature 
may be quantitative rather than qualitative. 
(c) Singer and Day (1966b) reported that 
the requirement to make judgments about 
Position is the most effective method for 
raising adaptation. It should be noted, first 
of all, however, that making judgments in 
the absence of any kinesthetic participation 
In the process (Singer & Day, 1966b, Experi- 
ment IT) produced relatively little adaptation 
and, statistically, no more than under condi- 
tions which did not require the subject to 
Make judgments. Judgment by itself, there- 
fore, does not seem to have a powerful effect 
9n adaptation. Second, it may be asked what 
Precisely the so-called judgment condition 
Consisted of. Operationally, the subject during 
training was required to set a bar such that 
It looked, as well as felt, horizontal. He was 
to do this seven times. One thing that might 
e happening during the so-called judgment 
Phase js heightened activity of the system of 
the eye. Hay and Pick (1966) demonstrated 
this process to be ongoing in adaptation, and 
this alone could account for the adaptation 
observed, Moreover, this kind of adaptation 
Could take place along the lines indicated by 
eld. In brief, the judgment studies of Singer 
and Day do not necessarily conflict with the 
"5sumption that central processes are involved 
™ perception, 
Weinstein, Sersen, Fisher, and Weisinger 
1964) also have expanded on some of the 
Studies originally inspired by Held and his 
Collaborators. They were interested in adapta- 
“on to displacements of the straight ahead. 
ae the exposure period, all subjects sil 
E Wheelchairs and either wheeled themselves 
oan or were pushed around a corridor for 
di e hour, Either condition was further sub- 
; Videq according to whether or not the sub- 
à Was allowed to decide where to go. Wein- 
an Pi et al, reported significant adaptation o" 
condi: Conditions, but somewhat more in the 
itions involving direction by the subject. 
* picture left by these results, it would 


Seer t 
"^ is that either self-initiated movement, 
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or decisions about what movements to make, 
or the requirement for the subject to make 
judgments about specific events leads to 
greater adaptation. At least one element that 
seems common to all of these is that the orga- 
nism must in some way actively coníront the 
production and/or the reception of input from 
the environment. In this connection, it is 
worth noting that Festinger et al. (1967) and 
Taylor (1962) argued that perception need 
not involve so much actual motor activity as 
a readiness to activate some programmed 
efferent signal. 

All of the above studies seem to point to 
the importance of extraoptic, central param- 
eters in perception. The evidence casts serious 
doubts on the completeness of Gibson's theory 
of direct visual perception. The theory of a 
relation between optic and central nonoptic 
variables, which has been proposed by several 
of the above researchers, is not the same as 
Gibson's conception of the substitutability of 
optic and nonoptic parameters. In the latter 
case, information, say, about the organism's 
own actions may come either from propriocep- 
tors or exteroceptors. In the former case, both 
optic and central nonoptic parameters are re- 
quired to form a relation; one cannot be 
substituted for the other. 


REVISION oF SOME ASSUMPTIONS 
ABOUT PERCEPTION 


This section spells out new ideas which 
modify the outlook that information from the 
ambient optic array is sufficient for percep- 
tion, and that information about the orga- 
nism's actions comes essentially from pe- 
ripheral feedback írom such actions, be it 
proprioceptive or exteroceptive. 

It appears from the studies cited in the 
previous section that organisms are not quite 
the passive receivers of stimulation implied 
in Gibson’s theory. To use the term “passive” 
in connection with Gibson's theory may seem 
strange in view of Gibson's insistence that 
the organism seeks information and is motori- 
cally active while perceiving. Nevertheless, 
Gibson’s theory of perception assumes a pas- 
sive process in the sense that the organism 
itself does not contribute to the perceptual 
information through its voluntary activity. 
The findings reported in the previous section 
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suggest a different story, however. Theories 
proposed by von Holst, Sperry, Held, 
Festinger, etc., have it that in addition to 
receiving optic input, the organism in some 
way also monitors its own behavior emitted 
to produce such input. Specifically, informa- 
tion from monitored behavior interacts with 
incoming optic input and interprets it via the 
“efferent copy" or an equivalent mechanism. 
Note again that the stress of an interaction 
between events internal to the central nervous 
system and afierent inputs from the ambient 
optic array in no way implies a position, 
criticized by Gibson (1966), that sensory in- 
formation needs to be "corrected" by the 
brain. Von Holst, Sperry, etc., could easily 
accept Gibson's (1968) crucial and produc- 
tive distinction between image optics and eco- 
logical optics while nevertheless insisting that 
information available from the latter is not 
sufficient for perception. 

As a model to account for the data, the 
theories proposed by these and other re- 
searchers handle findings that are not explain- 
able by Gibson's theory. Still, the question 
may be raised whether there is any direct 
evidence from neuropsychology that can be 
marshaled in support of these kinds of theo- 
ries or, alternatively, whether there are data 
that might support a competing but neverthe- 
less “active” (in the above sense) hypothesis. 
It appears that there are such data. 

Research that is relevant has explored 
whether self-produced voluntary movement or 
instrumental activity can be learned without 
the benefit of peripheral proprioceptive or 
exteroceptive feedback. The kind of move- 
ment or activity under consideration here is 
that which is involved when the organism pro- 
vides itself with new stimulation and conse- 
quently new perceptions by the perform- 
ance of motor acts. It is what Konorski 
(1967) called Type II conditioning involving 
Tesponse-contingent stimulation. If an orga- 
nism can, at a central level, be aware of and 
regulate its own behavior, the organization 
E iwi would not have to presuppose an 

eady independently established peripheral 
Door Under these conditions, 
fs Gun. ax s no reason to favor the assump- 

or activity is controlled by, or 


im the service of, perception. The exact op- 
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posite or, better, the mutual dependence be- 
tween these two would be an equally good 
hypothesis. Perception thus might be a proc- 
ess from the outside in as well as from the 
inside out. Voluntary action and perception 
could be mutually organizing each other. This 
would suggest that the theory proposed by 
von Holst and others could be tenable. 
Evidence that voluntary activity can 
organized relatively independently from pe 
ripheral input has been accumulating over a 
number of years. The evidence in question 
comes from the research on deafferented ani- 
mals. This work has been performed by 
numerous researchers producing largely sim 
lar results on two continents, and it has been 
summarized nicely by Konorski (1967) an 
Taub and Berman (1968). To anticipate the 
results, it may be stated that this work shows ; 
that voluntary motion is possible without e5 
sentially any peripheral feedback taking P a 
The deafferentiation work takes off fro 
early findings reported by Mott and Sherrin? 
ton (1895) and Sherrington (1931) that dd 
ferentiation of a single limb leads to |^, 
incapacitation of that limb for any purpose , 
activity. These findings seemed to supp? 
Sherrington's theory of the reflex arc 
on the findings of the afferent and e 
spinal neuron. This research has since dies 
superseded by very interesting further stu! i 
which have explored conditions under W x 
deafferented systems do manifest purr 
ful activity. The newer findings have = 
a revision of the Mott and Sherrie 
interpretation. ; stiallY 
The research that is reported was Hm reler 
designed to insure, by deafferentiation " 
vant parts of the spinal cord, that dn uve 
mal does not get any direct propriocel s, 
feedback from its own voluntary move ia 
Subsequent research has also becom? peral 
cerned with eliminating all indirect peril ac^ 
feedback. Examples of such indirect weet 5 
are the sound of a buzzer which term avoid 
as the movement which is designet e E 


be 


fferent 


e 


d 


shock is initiated, the distension © ye 
which occurs as the deafferented vn u tat? 
stimulation of the middle ear during Y tio” 


ga 3 min je 
activity, etc. The concern with the elim ^ nni 


of indirect peripheral feedback mor i 
mal's own activity has led later expe 


re 
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to expand the deafferentiation so as to wipe 
out most of the somatic and interoceptive 
feedbacks. Moreover, there has been an inter- 
est not only in gross but in fine-grained 
movements. Finally, trace conditioning has 
been used in an attempt to eliminate associ- 
ations between voluntary movement and the 
conditioned stimulus. 

It may be pointed out that findings con- 
trary to those reported by Mott and Sher- 
rington (1895) and Sherrington (1931) first 
appeared when experimenters began to use 
both fore or both hind limbs in producing 
voluntary instrumental activity or, if they 
used one limb only, the other limb was re- 
strained. Earlier studies by Sherrington had 
used one limb only, while leaving the other 
limb free. Typically, in these newer studies 
animals like monkeys, cats, dogs, or rats are 
trained to produce instrumental responses in- 
volving one or two limbs or, alternatively, 
a fine grasping movement. Instrumental re- 
Sponses are used both to produce positive or 
to ward off negative effects. This training may 
either precede or follow the process of elimi- 
Nating surgically, and otherwise, all of the 
relevant peripheral feedback. Normally, in 
addition to studying the animal under condi- 
tioning, it is also studied in a “free” situation 
to Observe if normal functioning of the limb, 
Singly and in concert with other body parts, 
will return deafferentiation. The 


Knapp, Taub, and Berman (1958, 
1963, [1964]?); Taub, Bacon, and Berman 
(1965); Taub and Berman (1963); Taub, 
Ellman, and Berman (1966); Wynne and 
olomon (1955). 
he number of studies involved precludes a 
ee review here. Instead, only a sum- 
ary of the major findings is attempted. 
These are as follows: (a) Under conditions 
cither of partial deafferentiation to prevent 
©edback from a given instrumental limb or 
Pair of limbs, total deafferentiation plus plind- 


el Taub & A. J, Berman. The effect of massive 
i Matic deafferentiation on behavior and wakefulness 
Doma P keys, Paper read at the meeting of Psycho- 

1€ Society, Niagara Falls, Ontario, October 1964. 


folding, or under conditions of the combined 
surgical and pharmacological interruption of 
the autonomic nervous system, the animals in 
question can be trained to produce the instru- 
mental response, or, if trained previously, the 
animal retains the instrumental response or it 
can be retrained. (b) Both gross instrumental 
responses such as forelimb flexion and fine 
instrumental behavior such as finger move- 
ment are preserved or can be trained under 
the above conditions. (c) In the “free” 
situation and for both the partially deaffer- 
ented and totally deafferented animals there 
is total restoration of function for the mon- 
keys. Partially deafferented animals, for ex- 
ample, become successful at climbing the wire 
mesh fence within 2-6 months. Totally de- 
afferented animals showed restoration of func- 
tioning of the forelimb. Because of the lack 
of sufficiently long-term survival of the totally 
deafierented animals, hind-leg functioning was 
not restored, and it is therefore unknown if 
such restoration can be affected. (d) The only 
restriction to be placed on the findings under 
Item c above is that, if only one limb is de- 
afferented, the animal's unaffected limb must 
be restrained during training. If it is not, the 
findings reported for the “free” situation 
under Item c above do not occur. The expla- 
nation for this, which is given by Taub and 
Berman (1968), is that the movements of the 
unaffected limb have an inhibitory effect on 
the other limb which, due to the operation, 
is no longer held in check by the ipsilateral 
segmental afferent inflow. This problem does 
not arise, however, when both limbs are 
deafferented. (e) If a totally deafferented 
animal, which is deprived of all somatic 
feedback as well as all feedback from sym- 
pathetic and sacral parasympathetic path- 
ways, is now also deprived of feedback from 
most of the cranial parasympathetic system, 
avoidance responses are still given. Elimina- 
tion of all feedback tends to put the animal 
to sleep, however. 

The above studies show that response- 
produced feedback is not zecessarv for in- 
strumental conditioning (Konorski, 1967; 
Taub & Berman, 1968). Konorski (1967) 
has shown elsewhere that such feedback is 
not sufficient—that is, he reported research 
on the general conditionability of passive 
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movements and movements obtained by brain 
stimulation and predicted that these should 
not be conditionable on the assumption that 
peripheral feedback is neither necessary nor 
sufficient. As to the former, Konorski (1967) 
concluded that passive movements cannot be 
used in instrumental conditioning (Woodbury, 
1942). Konorski agreed that there seem to 
be exceptions to this rule, as in some experi- 
ments on larger animals like goats and big 
dogs. Exceptions also seem to occur in ex- 
periments in which animals are passively 
transported between a starting point and a 
goal but” nevertheless are able to use this 
experience to proceed thereafter on their own 
to a goal box (Beritoff, 1965). Finally, excep- 
tions seem to arise in certain “insight” 
experiments (Konorski, 1950). In each of 
these cases, however, Konorski suggested that 
more than purely peripheral information is 
available to the animals involved. Either the 
active intervention of reflexes such as myo- 
tatic reflexes are involved in the work with 
large animals; or responses are made by the 
animal which go unnoticed by the observer; 
or, in the case of "insight" experiments, well- 
known responses tend to be involved which 
already have extensive internal representa- 
tions that may be triggered off by, say, visual 
input and do not require feedback from actual 
locomotion. 

In the case of movement obtained by brain 
stimulation, Konorski (1967) concluded that 
all movements produced by stimulation of 
the motor cortex which are produced by the 
efferent part of a reflex arc only cannot be 
instrumentally conditioned. Konorski argued: 


only those movements can become instrumentalized 
Which are accomplished by the intermediary of the 
central nervous system—in other words, which have 
a reflex character in the broad sense of the word 
[Konorski, 1967, p. 485]. 


Tn short, only those movements can be instru- 
mentalized which are mediated by the central 
behavioral system—that is, which are per- 
formed by the organism itself and are not 
forced upon it, i 


The cumulative impact of these studies is 
to suggest that voluntary activity can take 
place 3 á 


pa with little or no peripheral feedback 
at that such feedback by itself is not suf- 
lent or necessary, except in a very marginal 
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way. Thus, there must exist a system at a 
central level whereby the organism can be 
aware of its own activity, and this informa- 
tion must play a role in voluntary activity. 
This is the kind of system which von Holst 
(1954) perhaps alluded to and which is à 
candidate for an explanation of the data 
reported in the first part of this article. 
Konorski (1967) referred to this central 
system which registers activity of the central 
nervous system under the rubric of kines- 
thesis. He distinguished kinesthesis from pro- 
prioception and exteroception on the basis 
that the latter can be controlled externally; 
whereas the former cannot. Kinesthesis. 9 
Konorski, is concerned with the afferent 5Y* 
tem which inputs information, not about ex 
ternally controllable events, but about avy 
produced by the central nervous ae 
itself; it informs the central nervous syster 
about its own activity. The kinesthetic ani 
lyzer which records muscular contractions E 
placed by Konorski at the level of the pu 
bellum. According to Konorski, the cerebelli 
provides the basic input to the motor ad 
esthetic) cortex which thus obtains M 
tion about complex movements from, wr 
input. In turn, the cerebellum receives inp! 
from that motor cortex, as well as inform@ ] 
from tendon and stretch receptors. 
Taub and Berman (1968) also 
research (Chang, 1955; Kuypers, 
Levitt, Carreras, Liu, & Chambers, 
Li, 1958) that has bearing on a mechan y 
with a purely central feedback mecha ure 
which returns information concerning „stem 
movements to the central nervous SY 050 
before the impulses which will produce ser 
movements have reached the perl he 
thereby allowing the animal to nie no 
position of its limb even when there perm 
peripheral sensations. Taub and «gould 
(1968) stated that the mechanism m the 
seem to involve afferent collaterals To ud? 
medullary pyramidal tracts to the the 
gracilis and cuneatus, thence back M ;ali? 
cerebral cortex through ventralis” 
[p. 188].” wots sy* 
It appears that organisms do pos? si 
tems at the level of the central nervous ci 
which tell them about their ow? en s” 
Moreover, this central information Se 


reported 
1 60: 
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be essential for the conduct of response- 
contingent voluntary activity. These findings, 
coupled with the findings reported in the 
second part of this article which are not easily 
reducible to an interpretation. similar to 
Gibson's, would seem to make an explanation 
that involves both information from the optic 
array and, centrally, from the organism’s own 
activity at least potentially fruitful for per- 
ception theory. A modification of Gibson’s 
assumptions that information from the optic 
array is sufficient for perception would seem 
to be in order. The modification in general 
terms would have to involve the notion that 
Perception requires the intermediary of the 
central nervous system. The intermediary, 
that is, of an organism which decides and does 
not merely resonate. 


CONCLUSIONS 


It would appear that the separation be- 
tween perception and voluntary activity as 
two functionally distinct processes can be 
brought into question. Perception is not im- 
Posed on a passively “resonating” organism. 
Peripheral input and central inputs due to 
the organism’s voluntary activity are both 
involved and may, to an extent, even be 
shaping or organizing each other, thus lending 
a feature of self-organization to the perceptual 
Process. The hypothesis that behavior and 
Perception might be organizing each other 
as been a feature of computer simulation 
Work in perception of Gyr, Brown, Willey, 
and Zivian (1966), and the hypothesis was 
Proposed earlier by Platt (1962). Concrete 
evidence of possibilities for “functional” self- 
Organizing activity of the brain (also proposed 
by Hebb, 1948) has not been readily avail- 
able until recently. The empirical fact of 
Processes of functional organization in the 
rain (at least in the intertectal space) has 
Tecently been reported, however, by Gaze. 
Keating, Székely, and Beazley (1970). 

Conceived as an interaction between central 
and peripheral inputs and feedback, whether 
Preformed or functional, perception may be 
Said to possess an “internal semantic” by 
Lh the organism can bridge the gap be- 
Ween the “inner” and “outer.” since these 
two realms are, or can become, organized 


Ogether, By conceiving of an organism that 
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does more than resonate but, rather, com- 
pares and decides, the model lends itself 
better to explaining attention and selection 
than does a model that discusses the “outer” 
in terms that are not organically related to 
the “inner.” 
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WHAT THE TONGUE TELLS THE BRAIN 


HARVEY M. SUSSMAN ! 
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The versatility of the human speech production system and an increasing body 
of evidence suggests that speech is controlled by an intricate closed-loop feed- 
back system. To bring about feedback control of the speech musculature, the 
higher neural centers should be kept constantly aware of (a) the spatial 
position, (b) the direction of movement, and (c) the rate of movement of the 
articulators. This review describes the feedback mechanisms existing within the 
tongue that can mediate such dynamic space-time information. The unique 
three-dimensional arrangement of the lingual muscle-spindle network is struc- 
turally organized to operate as a built-in geometric reference system, This 
network is capable of signaling higher brain centers as to the changing length, 
position, and rate of movement of the tongue during the articulatory motions 
of human speech. The short-latency, cervical dorsal root pathway conducting 
hypoglossal afferent information is described as well as the cortical projections 
of this complex, rapidly acting feedback system. The implications of these 
neurophysiological findings support a phonetic-target-oriented theory of speech 


production. Neural receptors 


coarticulation, 


Perhaps the most characteristic and dis- 
tinguishable motor trait of man is his unique 
ability to produce speech. This immensely 
complicated sensorimotor behavior has been 
the subject of many theoretical and experi- 
mental investigations, but, to date, no satis- 
factory theory of the motor control of speech, 
based on neurophysiological reality and fact, 
has been achieved. In fact, there has been a 
systematic divergence between speech produc- 
tion theories and neurophysiological research, 
with the former heading more and more 
toward an open-loop position while the latter 
builds a stronger and stronger case for a 
closed-loop position.” This study represents 
an introductory effort to decipher the neuro- 
coding controlling the muscular movement se- 
quences of speech and to promote a closed- 
loop feedback theory explaining the moment- 


.' Sincere gratitude is extended to Peter F. Mac- 
Neilage, Curtis E. Weiss, and Kim Oller for their 
Constructive comments during the preparation of 
this article and to Karl U. Smith whose theoretical 
insights stimulated the inception of the article. 
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Sussman, Department of L 
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concurrently changing contour and apes any 
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other muscle system in the body. Any aighe" 
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musculature; (5) to trace the afferent route 
from the intrafusal muscle fibers to the higher 
brain centers; and (c) to describe how the 
Proprioceptive and kinesthetic senses can be 
mediated by the tongue muscle-spindle ar- 


rangement. As Bowman and Combs (1968) 
stated: 


The need for a rapidly acting monitoring and con- 
trol mechanism is particularly apparent in the case 
of human tongue activity, for this structure appears 
to execute among the most highly refined, complex, 
and rapid movements of any muscle group in the 
body during the production of the patterned. sound 
Sequences characteristic of adult speech [p. 105]. 


Before discussing the neurophysiological 
Structures and pathways contributing to the 
Control of tongue-muscle activity, some be- 
havioral correlates illustrating closed-loop con- 
trol of speech are treated. 


BrnavromanL EvipENCE For CLoskp-Loop 
Moron CONTROL OF SPEECH 


Since the appearance of Fairbanks’ (1954) 
Servosystem model of speech production, much 
Controversy has existed as to the nature of 
the motor control of speech. This controversy 
Nas centered around the open- versus closed- 
loop control of speech. Recently, speech sci- 
*htists have been able to lend behavioral sup- 
Port to the ever-increasing collection of neuro- 
Physiological findings illustrating an extensive 
feedback system existing within the neural 
Substrate regulating skeletal muscle activity 
(Eccles, Eccles, Iaggo, & Lundberg, 1960; 
Eldred, Granit, & Merton, 1953; Granit, 
1955; Granit & Henatsch, 1956; Houk & 
Henneman, 1967; Matthews, 1964; Merton, 
1953). 

The initial experimental work on the sen- 
Sory feedback control of speech manipulated 
the Parameters of auditory and tactual feed- 
ack ang investigated how these disturbed 
®edback channels affected the articulatory 
Process and the acoustic characteristics of the 
SPeech signal. The work of Lee (1950a, 
19505. Fairbanks (1955), McCroskey 
(1958), and Ringel and Steer (1963) is rep- 
“sentative of the research performed in this 
me The overall results illustrated the dis- 
"Dtive influences caused by delayed or 
inked auditory feedback on the frequency. 

ensity, and durational (prosodic) charac- 
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teristics of speech and the disruptive effects 
of oral sensory deprivation on the articulatory 
characteristics of speech generation. 

Relatively recent speech science research 
has made attempts to study the kinesthetic 
feedback channel as related to dynamic speech 
production. Such work has been largely in- 
ferential, but new developments in selective 
anesthesia and differential nerve blocks hold 
promise for more direct lines of sensory feed- 
back investigations of speech activity. 

Upon examining the movement character- 
istics of the jaw during ongoing speech, Mac- 
Neilage (1970) found evidence for closed- 
loop control during the initiation of speech 
utterances. He found that the jaw converged 
on a relatively invariant initial target position 
from varying prespeech origins. To achieve 
the speech initial target, the motor system 
must necessarily take account of the variable 
prespeech jaw positions. Only a neuromotor 
command system, constantly informed as to 
the moment-to-moment positional and rate of 
movement information of one of its com- 
ponents, can accomplish this high state of 
dynamic movement control as exemplified in 
jaw positioning during speech. 

Additional findings by Kozhevnikov and 
Chistovich (1965), Ohala, Hiki, Hubler, and 
Harshman (1968), MacNeilage (1970), and 
Sussman and Smith (1971) revealed a differ- 
ential velocity for jaw closure dependent upon 
vowel context. For the low front vowel /æ/, 
the jaw executed a faster rate of return in 
closing as compared to the high front vowel 
/i/. A closed-loop feedback theory would ex- 
plain such movement compensations as being 
due to peripheral feedback mechanisms (e.g., 
tendon receptors, muscle spindles) signaling 
the higher neural centers as to the extent of 
jaw depression during the opening phase of 
the vowel segments. A neuromotor command 
for a faster elevation of the jaw can conse- 
quently be given on the basis of the immedi- 
ately preceding state of the articulator. 

A recent study by Abbs?* illustrated the 
disruptive effects of a bilateral mandibular 


5]. H. Abbs. Use of a differential nerve block in 
speech. physiology. A Forum on Speech Communica- 
tion. Paper presented at the meeting of the Acoustical 
Society of America, Chicago, May 1970. 
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block on the integrity of jaw movements. By 
carefully controlling the amount of intra- 
muscular anesthetic and the timing of the 
sensory return of light touch and pain, it was 
possible to selectively anesthetize the gamma- 
efferent nerve fibers innervating the muscle 
spindles of the jaw. The movement records of 
the jaw under this type of sensory feedback 
blockage showed gross loss of fine positional 
control. A consistent reduction in jaw velocity 
and acceleration was observed, and the en- 
tire temporal patterning of jaw movements 
was significantly altered. These findings stress 
the importance of normally operative muscle- 
spindle receptors to bring about precise, finely 
coordinated, neuromotor control of the speech 
musculature. 


NEUROPHYSIOLOGICAL EVIDENCE FOR CLOSED- 
Loop Motor CONTROL OF SPEECH 


A muscular structure such as the tongue, 
possessing "the highest ratio of nerve fiber to 
muscle fiber of all the muscles except the eye 
muscles [Silverman, 1961, p. 368],” must de- 
pend on some type of control mechanism to 
bring about precise motor coordination during 
speech production. Since the absence of a mov- 
able joint in the tongue musculature precludes 
the existence of Golgi-type tendon receptors 
(Mountcastle & Darian-Smith, 1968) to sig- 
nal position, the mediating role must be han- 
dled by other neural transducing structures 
capable of transmitting rapid and varied 
space-time information. The tongue possesses 
two such possible mediating systems in the 
intricate exteroceptive (touch endings) and 
muscle-spindle end-organ transducers. This 
study largely concerns itself with the latter 
neuromuscular mechanisms, but, in any com- 
prehensive theory of tongue-movement con- 
trol, both the exteroceptive and muscle-spindle 
systems must be considered. 


Histological Research on the Presence of the 
Tongue Muscle Spindle 


The existence of a muscle-spindle system in 
is tongue has been a controversial issue 
i torey, 1967). The apparently conflicting 
‘ustological findings have been clarified since 
it has been shown that there are differences 


in the presen 

ce of tongue muscl i 

S g e spindles 
among species, . 
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The earliest reporting of the presence of a 
muscle spindle in the human tongue was by 
von Franque in 1891. Forster (1894) sup- 
ported this claim inasmuch as he found spin- 
dles in the genioglossus muscle and posterior 
sections of the human tongue. Ceccherelli 
(1908) cited the existence of muscle spindles 
in the human superior longitudinal muscle. 
Working with the pig, cat, dog, opossum, and 
rat, Langworthy (1924a, 1924b) reported the 
presence of end-organ structures in the tongue 
musculature. Tarkhan (1936) claimed to have 
found muscle spindles in the rabbit's tongue 
Negative results in the search for the tongue 
muscle spindle have been reported by Boyd 
(1937, 1941), Carleton (1938), Weddell; 
Harpman, Lambley, and Young (1940), anc 
Blom (1960). The negative findings have 
been confined to histological work with cats; 
rabbits, and rats. ae 

Cooper (1953) has made an extensive M 
logical survey in an effort to localize musc 
spindles in the human tongue. Numero? 
muscle spindles were found in the superio 
longitudinal and transverse muscle groups a 
the human tongue. They were described : 
consisting of very fine, spindle-shaped; e 
capsules (1.0 millimeter in length and a 
microns in maximum diameter) with numer 
intrafusal fibers. Spindles were found 1" € 
inferior longitudinal and vertical musc un 
Larger muscle spindles having nerve pe 
at their poles were located in the extn s 
genioglossus muscle. The muscle m. ole 
were not distributed equally over the po 
musculature. They were found to be € 
at the tip of the tongue, but plentiful iP ut 
region proximal to the tip. The animal to e 
preparations revealed numerous sp!" 
the rhesus monkey's tongue, but non 
found in the tongues of the cat or lamb. d the 
Walker and Rajagopal (1959) examine ma 
extrinsic and intrinsic muscles of the Paro" 
tongue and reported the presence O 0" 
muscular spindles in the genioglossus; in 
glossus, and styloglossus muscles as © E 
all the intrinsic muscles. The greatest r pio” 
of spindles was localized in the Lat re 
glossus muscle. The primate tongue 1968)" 
cently been examined by Bowman ex 
and he reported muscle spindles !n 
trinsic and intrinsic musculature. The 
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spindle population was localized in the vertical 
and transverse muscle groups, particularly in 
the midregion of the tongue. Although the 
Spindles were not uniformly distributed 
throughout a muscle, different intramuscular 
regions were well supplied with spindles. 

The above histological survey establishes 
the credibility of extensive lingual muscle- 
spindle populations in higher-order primates 
and man, The debate over the existence of 
muscle spindles in the tongue, therefore, 
should be ended and the focus of attention 
directed toward finding (a) what types of 
information the muscle spindles are capable of 
Providing to the higher neural centers con- 
cerning the speech motor process; and (b) 
What nerve pathways carry this highly en- 
Coded information. 


7 
N eurosensory Pathway 


An integral part of a proposed feedback- 
regulatory system for tongue-movement con- 
trol would be the centripetal pathway con- 
Veying afferent sensory information from the 
Muscle-spindle end organs to the motor cen- 
ters of the hypoglossal nerve in the medulla 
and to higher cortical regions. The exact 
heuropathway for the intraoral proprioceptive 
and kinesthetic senses has been a contro- 
Versial issue due to a questionable afferent 
Component within the essentially efferent hy- 
Poglossal nerve.* 

Numerous investigators (Barron, 1936; 

lom, 1960; Hensel & Zotterman, 1951; 

Orter, 1965; Zotterman, 1936) have re- 
Ported that afferent signals from the tongue 
fave the hypoglossal nerve along its periph- 
ral course and reach the central nervous sys- 
em through connections with branches of the 
gual nerve. Several investigators (Cooper, 

954; Downman, 1939; Green & Negishi, 
1965; Nakamura, 1968; Whitwam, Kidd, & 

"ssey, 1969) have found electrophysiologi- 
“al evidence to support an afferent role for 
the hypoglossal nerve without any peripheral 


sor The neural innervation of the tongue (ah i 
mit and motor) is comprised of the trigeminal Net 
(cho 2tanch of the mandibular division), fa al 
hy ene tympani branch), glossopharyngeal, = 
the Oslossa] nerves. The hypoglossal nerve por es 
trin motor innervation to all the intrinsic and ex- 
Sic tongue muscles except for the palatoglossus. 
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crossovers through the anastomoses existing 
between the lingual and hypoglossal branches. 

A recent study supporting a complex role 
ior an afferent component in the hypoglossal 
nerve is that of Sauerland and Mizuno (1968). 
Working with the cat, they found evidence for 
the existence of a specific reflex between hypo- 
glossal afferents and the intrinsic laryngeal 
musculature. Electrically stimulating the 
proximal end of the severed hypoglossal nerve 
led to multisynaptic discharges in the cervical 
vagus and the recurrent laryngeal nerve in- 
nervating the muscles of the larynx. The exact 
path followed by the tongue afferents was il- 
lustrated by observing the fate of the reflex 
upon systematically severing the nerve. When 
the hypoglossal or glossopharyngeal rootlets 
were cut at the point of their entrance into 
the brainstem, no change in íhe reflex oc- 
curred. However, when the vagal or cranial 
accessory rootlets were cut, or the fiber con- 
nections existing between the hypoglossal 
nerve and the nodose ganglion were destroyed, 
the reflex disappeared. Sauerland and Mizuno 
(1968) concluded: 


that hypoglossal fibers, carrying afferent volleys from 
extrinsic and intrinsic portions of the tongue, join 
the vagus at the level of the nodose ganglion and 
reach the brain stem via the jugular foramen [p. 
258]. 

It is apparent that the afferent pathway 
from the tongue to the brainstem consists of 
a hypoglossal-to-cervical nerve route and a 
hypoglossal-to-lingual nerve route. The next 
objective is to determine the nature of the 
information contained in these afferent path- 


ways. 


Neuromuscular Control of the Tongue 


The mobility of the tongue is brought 
about by a dual, spatially referenced, recip- 
rocally innervated, muscular arrangement. 
The muscles of the tongue can be classified 
into two general categories—the intrinsic and 
extrinsic muscles. The former have no direct 
attachments onto bone or other fixed struc- 
tural supports. They terminate within each 
other or in extrinsic muscle groups. Of cru- 
cial significance is the layout of the intrinsic 
muscles. All three spatial planes are repre- 
sented by specific extrafusal fiber groups com- 
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prising the longitudinal, transverse, and verti- 
cal directions. These three muscle masses can 
assume an almost infinite array of shapes and 
sizes, but they cannot, by themselves, move 
the tongue. This function is handled by the 
extrinsic musculature that provides a type of 
“scaffolding” (Silverman, 1961) enabling the 
tongue to be moved about the oral cavity. The 
tricoordinate spatial reference system of the 
intrinsic tongue musculature is intricately in- 
volved in the mediation of tongue position and 
movement. 

An extensive study examining the response 
characteristics of tongue muscle spindles was 
undertaken by Bowman and Combs (1968) 
in an attempt to learn about the neural sub- 
strate controlling tongue muscle activity dur- 
ing speech. A mechanical pulley device was 
used to provide four independent stretch stim- 
uli (anterior, posterior, right, and left) to the 
tongue of a rhesus monkey. Fibers sensitive to 
stretch were identified in the main trunk, 
distal to the descending ramus, and in the 
lateral and medial end branches of the hypo- 
glossal nerve. Of more concern was the fact 
that all of the hypoglossal nerve units (over 
70 stretch-sensitive fibers) showed directional 
sensitivity. A preferential frequency alteration 
was revealed depending upon the particular 
direction of the stretch stimuli. Categorizing 
the response patterns of the hypoglossal units 
to the four stretch stimuli directions, Bowman 
and Combs distinguished 10 unique response 
patterns. Even within the same frequency re- 
sponse pattern to right-left and anterior- 
posterior stretch, many units were found to 
be differentially sensitive to one particular 
Stretch direction. The units defined also re- 
Sponded to different magnitudes of stretch 
with different impulse frequencies. The incre- 
ment or decrement in discharge frequency was 
directly proportional to the amount of direc- 
tional tension developed during the stretch. In 
addition, severa] of the stretch-sensitive fibers 
showed velocity responses during the dynamic 
Phase of the stretch. Such activity appeared 
85 an acceleration of discharge spike frequency 
Concurrent with the time 
lengthening of the muscle in 
end organ w. 
the dynami 


period of active 
which the unit’s 
as contained. Upon cessation of 
€ stretch phase, the spike fre- 
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quency of the unit settled to a lower value 
concomitant with the new increased muscle 
length. Similar but opposite findings 0c- 
curred for units displaying decreases in spike 
frequency during active stretch. For these 
units, the velocity response took place upon 
the release of the stretch, signaled by an 
acceleration of impulse frequency greater than 
the poststretch release discharge rate. Thus, 
specific nerve fibers were isolated that con- 
tained afferent information signaling the appli- 
cation and release of stretch stimuli. This 
neural decoding of some of the movement dy- 
namics of the tongue presents a clear series 
of neuromuscular events that can provide 7 
beginning explanation for the mediation ee 
control of articulatory maneuvers of we: 
tongue. Not only can the higher brain res 
be kept informed as to the initiation of a p 
speed consonantal gesture of the tongue 


f rë- 

also as to the attainment and subsequent 

lease of that gesture. 1 of 
Thus far, the neuromuscular systen 


the tongue has been shown to be à bail 
feedback system that can signal the pon 
and rate of movement of a muscle, What pa 
mains to be explained is the positional oT zi 
rectional information that the tongue neu 1 


muscular structure must provide to centr? 
motor centers. nse 

The several distinct varieties of al " 
patterns found by Bowman and oe 
(1968) were explained by the unique ae 
dimensional arrangement of the tongue e p- 
fusal fibers. This arrangement is 1” hi 


trast to limb extrafusal fibers where T s M 
gradient of distortion to stretch S aral 
form throughout the muscle due to ca A 
lel structure of the fibers within the milter” 
stretch stimulus would be expected c espe 
entially affect a spindle according to VY pets 
cific orientation of the extrafusal muscle rst 


e 
E a Mp ansve a 
—that is, vertical, longitudinal, or tr^, gr. 


A muscle spindle located in the m dif 
ranged fibers, for example, would et^ 
ferently to an anterior-posterior stre " 
compared to a muscle spindle locate pout? 
longitudinal fibers. Every unique aona ed 
tion of the tongue responsible for the 

places and manners of articulatior 


ea ^ ning 
tail different lengthening and shorte 


as 
the 


" wan en 


| 


THE TONGUE AND THE BRAIN 


Sions on the extrafusal muscle fibers. A dis- 
tinctive overall pattern of afferent firing from 
the sensory nerve fibers serving the muscle 
spindles would be generated according to the 
particular contour and position of the tongue 
at the moment. The coordinate arrangement 
of the tongue extrafusal fibers can also ex- 
plain the velocity responses to the release and 
active stretch phases of the stimulus. 
Bowman and Combs (1968) concluded that 
the tricoordinate spatial arrangement of the 
tongue muscle spindles “could constitute 
part of the structural substrate for a highly 
discriminative feedback system [p. 117]." 
With the absence of joint movement in the 
tongue musculature, the central motor centers 
have need of distinctive information pertain- 
Ing to the rate, extent, and direction of move- 
Ment. Since the complexity and timing of 
tongue movements are superior to that of the 
imbs, it is logical to assume that the afferent 
discharge pattern emanating from the tongue 
Should contain high-level distinctive informa- 
tion, Such discriminative information can be 
Provided by the differential frequency dis- 
charge patterning of the muscle spindles due 
lo the orientation of the extrafusal fibers 
Telative to the direction of movement. 

Upon examination of units of the lingual 
nerve, Bowman and Combs (1968) showed 
that the adequate stimulus for response was 
Not stretch of the tongue muscles but distor- 
tion of the tongue surface. Each lingual unit 
Was associated with a discrete receptive field on 
e dorsal surface of the tongue. Such a spe- 
“alization can bring about the fine positional 
Control needed for accurate place of articula- 
'on during the production of consonants. The 
differing contact points of the tongue on the 
alveolar ridge, hard palate, and soft palate for 
the Droduction of /t/, /t//, and /k/, for ex- 
"ble. can be mediated by the discrete map- 
Pm of the tongue surface by the afferent 
AR of the lingual nerve. Thus, the extero- 
a tive-touch neurostructure is largely han- 
a by the lingual nerve and the musel: 
Pindle system (kinesthesia) through the by 
inlossal nerve, Both channels of sensory 
mormation are needed for omn 
nq ve movement control (Sussman, n press" 

> Moreover, both nerve channels carry 
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enough coded descriptive information concern- 
ing contact and movement events of speech 
activity that the central nervous system is 
more than amply informed as to the status of 
the articulators at any moment in time.* 

The information provided by peripherally 
located muscle-spindle systems might be inte- 
grated with higher-level cortical functioning 
by means of a type of interneuron revealed by 
Porter (1967) in the spinal trigeminal nu- 
cleus. These cells responded in a fashion that 
implied their involvement in the neural path- 
way from the cerebral cortex to hypoglossal 
motor neurons in the medulla. Stimulation of 
either the lingual nerve or the cerebral motor 
cortex caused these interneurons to discharge 
and induce excitation in hypoglossal motor 
neurons. Thus, spatially separated excitatory 
influences originating from the periphery or 
higher brain centers can converge on these 
neurons, and the integrated response will be 
transmitted to the appropriate hypoglossal 
motor neurons to bring about a means for 
efferent feedback control of ongoing lingual 
muscular events. 


Sensory Projections and Loop Time 


A recent study by Bowman and Combs 
(1969a) attempted to determine the cerebro- 
cortical projections of the hypoglossal afferents 
from the tongue muscle spindles of the rhesus 
monkey. Afferent fibers in the distal portion 
of the hypoglossal nerve were found to project 
to a restricted area of the contralateral sen- 
sorimotor cortex. The responsive area was 
within the classical boundaries of the motor 
facial area anteriorly (precentral gyri) and 
the sensory facial area posteriorly (postcen- 
tral gyri). There were no ipsilateral cerebro- 


5 The abundant source of neuromuscular receptors 
and sensory end organs found in the vocal-tract cav- 
ity may offer an explanation for the intelligible 
nature of speech produced under disturbed sensory 
feedback conditions. Despite the loss of either tactual, 
kinesthetic, or auditory feedback, the speech signal 
maintains a core of intelligibility that is quite puz 
zling. However, if one realizes the extent of senso 
cues that the motor control centers can rely on, it 
easy to understand the immunity of the speech pro- 
duction system to disturbed feedback channels. The 
motor centers can switch from one source of move- 
ment information that is no longer functioning prop- 
another sensory modality that is unaffected. 


s 


erly to 
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cortical projections found. Of considerable im- 
portance was the finding that the cortical po- 
tentials disappeared upon sectioning of the 
ipsilateral C2 or C3 dorsal roots. This indi- 
cated that the afferent fibers present in the 
distal portion of the hypoglossal nerve reached 
the central nervous system through the second 
and third cervical dorsal roots. 


Of crucial importance to a closed-loop feed- 
back theorv of speech production was the 
additional finding indicating that the cortical 
potentials evoked by hypoglossal stimulation 
occurred with only a 4—5-millisecond latency. 
Such a short latency suggested that the medi- 
ating hypoglossal fibers were rapidly conduct- 
ing. Furthermore, the hypoglossal stimulus 
intensities that were needed for elicitation of 
a threshold cortical response were equivalent 
to those needed to produce an evoked cortical 
potential with threshold activation of the 
spinal motor nerves (i.e, Group la sensory 
fibers). The cervical dorsal root latency also 
suggested rapidly conducting muscle afferents 
as the afferent volley reached the ipsilateral 
C2 dorsal root in approximately .5 millisec- 
onds. The caliber spectrum of the hypoglossal 
nerve reveals a significantly greater number 
of Group 1 diameter fibers in the distal por- 
tion. Interestingly, the hypoglossal nerve of 
the cat (where no muscle spindles have been 
found) is characterized by a uniform caliber 
spectrum throughout the peripheral course and 
contains very few Group 1 diameter fibers 
(Blom, 1960). The extremely fast conduction 
speeds exemplified by the muscle-spindle af- 
ferents lend credence to the view that during 
the rapid volitional motor acts of speech, the 
gamma-efferent muscle-spindle | servosystem 
can functionally operate to help bring about 
movement control. 

The neurological similarities between the 
monosynaptic reflex common in the forelimb 
skeletal muscles ( muscle-spindle feedback Sys- 
tem) to that of the hypoglossal nerve reflex 
initiated by tongue spindle afferents appears 
obvious. In both instances the rapidly con- 
ducting Group 1a afferent fibers are the neural 
nie mediating the cortical projections. 

an and Combs stated: 


the existence of a short- 


latency, i i k 
back channel from the y, rapidly acting feed 


lingual spindles thus could 
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form part of a neural mechanism engaged in regu- 
lating tongue movements elicited from the cortex. 
In this regard, the type of information forwarded 
by the lingual spindle afferents in response to move- 
ment could be a discriminative indicant of the nature 
of ongoing lingual motor activity [p. 300]. 


Upon examining cerebellar responsiveness 
to the afferent discharges of muscle-spindle 
fibers in the hypoglossal nerve, Bowman and 
Combs (1969b) failed to detect evoked cere- 
bellar responses. Neither ipsilateral nor con- 
tralateral hypoglossal nerve volleys elicited 
action potentials from known cerebellar loci. 
This finding was tentatively explained by cit 
ing the relationship between Group 1 spindle 
fiber projections and the complexity of motor 
control involved. Oscarsson and Rosen (1963 ) 
showed that a direct relationship exists be- 
tween the extent of motor control of a muscle 
group from which the muscle-spindle afferents 
originate and the projection pathways of the 
Group la fibers. The forelimb muscle nerve? 
of the cat, for example, have cerebrocortica 
representations, whereas the hindlimb spindle 
fibers do not. The forelimbs are more exten” 
sively used in manipulative and explorative 
actions, while the hindlimbs are primal! 
used for locomotion (and hence a lower neuro" 
logical governing system). Since the muscle 
spindles in the tongue are most likely d 
phylogenetically recent acquisition and sinn 
lingual spindles do not exist in the v pé 
lature of animals below the primates, it !5 Ma 
unusual to find only cerebrocortical pror 
tions (higher level neurological functioning 
and not cerebellar projections from tone 
spindle afferents in the hypoglossal nerve 
greater amount of volitional lingual me 
control in higher organisms,° and especie 
the extent of tongue-movement comple? Y 
reached in human speech, can explain pe 
the tongue muscle-spindle afferent syste™ d 
not project to the same central areas as 
the limb-spindle afferents. (UE 

Since the degree of motor control 9 uc 
human tongue, as exhibited in speech pre ael 
tion, far exceeds that of the primate to sub" 
it is logical to suppose that the neur 


js 
[au 
" js 0 
5 During a child's acquisition of the sound ort 
language and during the early experimen" jon’ 


F voli 
utterances of infancy, a great deal of 


tongue movements are observed. 
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strate mediating the sensory feedback con- 
trol of the human tongue posse s even more 
cem mechanisms than revealed above 
cem primate tongue. The detailed differ- 
qum " of directional information provided 
Dir red ongue's built-in tricoordinate muscu- 
E TIS system is probably more pre- 
on eveloped in man, In articulating the 
eee sequences of the varied languages 
Es the tongue must execute finely 
h ed and timed positional movements. 
Bre renange vocal-tract cavity during 
(NM production requires the central moni- 
ts be ao to be constantly informed as to 
ent T h, spatial location, and rate of move- 
articula _ every muscle group used in active 
the ation, especially those of the tongue— 
major speech articulator. As Ohman 
(1967) stated: 


The i 

Stat moment-to-moment changes of the mechanical 

ba and geometric configuration of the whole sys- 

eae be continually signaled back to the brain 
* S1], 


Psycn OPHYSIOLOGICAL IMPLICATIONS 


b neurophysiological research that has 
idea een reviewed can generate several new 
i ian and offer support for already existing 
Ev to theoretical discussions of 
et production. At present there seem to 
o wo basic theories concerning the problem 
Poe invariance to account for the en- 
ng process of language. One camp suggests 
0 at speech sounds result from à finite store 
iny, neuromotor commands possessing some 
A) ariant relationship to a linguistic category 
Or example, the phoneme (Liberman, 
a Harris, & MacNeilage, 1962; Mise 
hedy. Cooper, Shankweiler, & Studdert-K m 
invari 1967). The other position holds that the 
e "rwr c. which must exist somewhere, mi 
cin in idealized target positions oF — 
in ee (Halle & Stevens, 19 " 
new lati 1963; Stevens & House, Lowe : 
the AGS on a target-based model to explain 
off, Serial ordering of speech has recently been 
ered by MacNeilage (1970). In this view, 
ome possesses an internalized space co- 
tial nate system that specifies invariant spa- 
targets for the articulators to achieve. 
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Motor commands, therefore, are generated as 
target-directed speech movements. 

The findings of Bowman and Combs (1968, 
1969a, 1969b) and Porter (1967) lend strong 
support to a model based on the notion of 
the speaker’s having an internalized spatial 
representation of the oral area. In order for 
an internalized space coordinate system to be 
realized, there must be sensory mechanisms 
available to signal the central nervous system 
regarding the status of the attainment of 
speciñed targets. The intricate muscle-spindle 
system of the tongue, which is a built-in 
spatial reference system, can provide the nec- 
essary information to keep the motor control 
centers aware of instantaneous muscle length, 
spatial position, and velocity. The type of 
interneuron found by Porter (1967) T" 
trates how peripheral feedback information - 
could possibly be integrated with cortical sig- 
nals to elicit proper motor commands to the 
intrinsic and extrinsic tongue musculature. 
With such a complex monitoring and inte- 
grating system operating, the articulatory 
targets can be achieved with a constant regu- 
larity. Indeed, the neurological reality of a 
target mechanism is given à firm supporting 
background with such neurophysiological find- 
ings. 

In addition to establishing the existence of 
muscle-spindle-hypoglossal pathways, Bow- 
man and Combs (1968) revealed that each 
nerve unit of the lingual nerve (mandibular 
branch of the trigeminal nerve) was associated 
with a discrete receptive field on the dorsal 
surface of the tongue. It is well known that 
the somatotypic organization of the sensory 
and motor areas of the cerebral cortex pro- 
vides for a relatively large area devoted to 
the facial musculature, especially the lips and 
tongue (Ruch, 1965). The finding by Bow- 
man and Combs substantiates the view that 
there is a vast potential existing within the 
oral cavity for a detailed, one-to-one mapping 
of the oral area onto corresponding cortical 
areas. Hence, the two neural systems (hypo- 
glossal and lingual) can provide an extensive 
repertoire of information to the higher con- 
trol centers to bring about closed-loop control 
of target attainment during the articulatory 


gestures of speech. 
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As a demonstration of the necessity of such 
a neural system, consider the case of speaking 
with a pipe clenched between the teeth. Un- 
der these conditions there is a remarkable re- 
organization of the muscular movement pat- 
terns of the tongue and lips to compensate 
for the stationary mandible. These modifica- 
tions result in perfectly intelligible speech. A 
speech production model based on an organ- 
ized spatial system of neuromuscular mecha- 
nisms capable of providing moment-to-moment 
feedback information can explain such spon- 
taneous modifications of the articulatory 
movement patterns. There is no need to spec- 
ulate on the existence of a store of prepro- 
grammed, open-loop, speech neuromotor com- 
mands “to be used in case of emergency.” 

The dynamic flexibility and efficiency ex- 
hibited by the speech motor control circuits 
is also very much dependent upon the velocity 
information that the muscle spindles are 
able to furnish. Ohman (1967) offered 
an illustration of the mechanical complexity 
involved in achieving articulatory, and hence 
acoustic, efficiency in speech production. Fo- 
cusing on the movement requirements of the 
tongue tip during the articulation of a dental 
stop consonant in a vowel-consonant-vowel 
phonetic context, Ohman pointed out that 
quite different events occur for the utterance 
/ada/ as compared to /idi . The coarticula- 
tory influences of the high front vowel /i 
and the low back vowel /a/ create a drasti- 
cally different set of circumstances in terms 
of motor signals controlling the apical /d/ 
gesture. First, the distances to be traversed 
are quite different in the two situations with 
the tongue having to travel considerably 
further after an initial /a/ to reach the alve- 
olar ridge as compared to an initial /i/. Sec- 
ond, the overall contour and state of the 
tongue musculature? preceding the conso- 
nantal gesture represent two opposite extremes 
in the two utterances, It is remarkable, to say 


* For the hi 


: gh front vowel /i/ the tongue assumes 
a forward po; 


sition in the oral cavity with a con- 


an rp and constriction of the tongue 
© m the palatal region. 3 Fd 
relatively hg The tongue body is 


ring the production of the ÁA/ 
» during the low back vowel /a/ 


s relatively spread out and limp at the 
€ mouth. 


Sesture. Conversely 
the tongue lie P 
bottom of thi 
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the least, that the formant transition dura- 
tions from vowel to consonant and from con- 
sonant to vowel are practically identical (Oh- 
man, 1965) in such circumstances. it sera 
very probable that in situations ingol ya 
highly variable mechanical and phonetic 4 
tors, higher control centers would rely ber 
on movement-rate information as provided el 
tongue muscle-spindle receptors to achieve t j 
dynamic movement constancy found in AF 
production. In order to make sure that, 4 
articulator arrives at a certain point p* 
certain time, some knowledge of how fast t d 
articulator is traveling must be obtained. Su i; 
velocity information is, moreover, multilev: 
in that initial onset, steady state, and termi 
slowdown velocity information can be au oil 
by the spindles. Servocontrol, as brought ed 
by muscle spindles, can thus alter the rà 


: :cle fibers: 
contraction of the extrafusal muscle fibe 


such as speech, is realized when pei 
criteria are used to judge the efficiency * 
system. As Ohman (1967) stated: / 
evel 
It is clear, from the acoustic point of view, howe 
that small margins of variability must be mam. 
ior onset and release [italics added] 4 
consonants, since the auditory quality of these man 


» form 
depends critically on the exact shape of ud s, Lis 
transitions contained in the adjacent bos Jw D 
too slow a /b/ release will sound like 2 
[p. 50]. 

CONCLUDING REMARKS ge 

á over 
More research is needed to waing f 
intricate feedback mechanisms E pasis 4 

n ne ck 
motions of speech and the neura "m 
their operation. Only a closed-loop ol 0 

bi cont”, g 


approach to the study of the motor S probit 
speech can justify research endeavor | epee 
into the peripheral musculature of t Joo? 
production apparatus. Obviously, à É 
approach assumes a preprogramme inde m 
motor command system, operating at "4 
dently from whatever is happen, spe? ie 
periphery. The justifiable realm of obs yvaP 

scientist is the periphery, where the nier 
and measurable events take place. bos” io 
about the workings of the “plack dicat” e 
best be made from behavioral data si st 
on-line adjustments of the speech Mi?” cate 


ck 
dependent upon either the feedb@ 
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the system or superimposed mechanical con- 
straints. 
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AN EXAMINATION OF SUBJECT ROLES, DEMAND 
CHARACTERISTICS, AND VALID INFERENCE ' 
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Four conceptualizations of the role that subjects adopt in laboratory experi- 


ments are explicated, and the empirical support for 
hat subjects adopt a good subject role 


There is no unconfounded evidence 


t 


them is then assessed. 


and seek to confirm hypotheses or that they adopt a negativistic role and seek 


to sabotage experiments, There js some 


evidence that in specific contexts sub- 


jects may adopt a faithful subject role, and there is considerable evidence that 
subjects are apprehensive about how their performance will be evaluated. 


Furthermore, being provided with 


duces bias. These relationships 
plications for improving researc| 
from experimental data. 


A number of important issues have been 
Tecently raised about laboratory experimenta- 
tion in the area of personality and social psy- 
Chology, These issues came into focus at dif- 
ferent times, and it is worthwhile detailing 
them in their approximate chronological se- 
quence, An early issue involved Campbell’s 
(1957) distinction between the internal and 
€xternal validity of experiments. A study is 
‘nternally valid if its findings were caused by 
: € experimental treatment. A study is ex- 
ernally valid if its results can be generalized 
across different settings and populations. This 

istinction highlighted the limited external 
validity of laboratory experiments, for find- 
ings from them cannot usually be generalized 
yond laboratory settings or college student 
Populations, ; 

Between 1962 and 1965 preliminary reports 
Wete published that questioned whether ex- 
beriments could provide valid inference. Ros- 
*nberg (1965) noted that subjects in ex- 
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a hypothesis about a study 
are then examined with respect to their im- 
h in general and for drawing valid inferences 


2 


consistently pro- 


periments are often anxious about how they 
will be evaluated, and he suggested that this 
anxiety might sometimes differ between ex- 
perimental conditions. Rosenthal (1963) ex- 
pressed concern as to whether some findings 
can be attributed, not to treatment variables, 
but to the experimenter’s expectancies about 
how an experiment would turn out. Orne 
(1962) questioned whether some findings 
might be artifacts of “demand characteristics” 
—cues in the experimental setting that allow 
subjects to infer how they are expected to 
behave. Also, Hofstatter (1957) and Riecken 
(1962) proposed that some treatment effects 
might be alternatively interpreted in terms of 
the strategic dynamics associated with power 
differences, for in an experiment the experi- 
menter typically controls most of the re- 
sources, and it is in the subject's best interest 
that he behave so as to be rewarded and not 
punished. 

Most of these early concerns about validity 
were speculative in the sense that they were 
based on little systematic data. Rosenberg's 
earliest concern about evaluation apprehen- 
sion was based on a single study of his own, 
and Campbell's (1957) worry about the fre- 
quency with which treatment effects might be 
due to pretest sensitization was based on one 
panel study. Moreover, Orne's (1962) think- 
ing about demand characteristics was based 
on observation of the apparently cooperative 
behavior of a few subjects, just as Masling's 
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(1966) comments on noncooperative subjects 
were based on a few verbal reports. Further- 
more, Rosenthal's (1963) early work on ex- 
perimenter expectancies involved a single ex- 
perimental procedure that permitted no gen- 
eralization to other procedures, and much of 
his later work reached only marginal levels of 
statistical significance, Lastly, the contribu- 
tions of Hofstatter and Riecken did not in- 
clude any data about the behavior of sub- 
jects in experiments. 

These threats to valid inference were at 
one time widely discussed and accepted by 
social psychologists, despite the paucity of 
empirical evidence. The concern about in- 
ternal validity was contemporaneous with con- 
cern about the ethical issues associated with 
using human subjects in laboratory experi- 
ments, The latter concern became widespread 
after 1964 (Baumrind, 1964; Brown, 1965; 
Kelman, 1967; Schultz, 1969; Smith, 1969), 
though it had been expressed earlier (Vinacke, 
1954). One ethical issue related to the 
possibility of subjects being physically or 
psychically harmed in experiments. A sec- 
ond involved the consideration that experi- 
menters typically benefit from experiments, 
while subjects benefit very little, if at all. 
A third issue was related to the frequent 
use of deception in research, for deception 
might debase honesty in the name of science, 
and it might debase Science through ration- 
alized dishonesty. And a fourth issue centered 
on the fact that in most experimental pro- 
cedures, subjects are used as powerless ob- 
jects in a status hierarchy rather than as equal 
collaborators in a common research project. 
These ethical concerns were probably voiced 
more loudly, and received more favorably, 
precisely because it seemed questionable 
whether experiments were capable of the 
unambiguous causal analysis for which they 


formance of su 
volved system, 
of this resear 


3 in a single volume (Rosenthal 
snow, 1969). Other researchers have pro- 


k 
vided evidence indicating that some artifacts 
may be less prevalent than was once feared. 
Lana’s (1969) work on pretest sensitization 
has led Campbell (1969) to reassess his esti- 
mate of its prevalence. Barber and Silver 
(1968) reanalyzed the experimenter bias 
studies cited by Rosenthal, and their reanaly- 
sis and their own failures to replicate Rosen- 
thal’s finding led them to question whether ie 
threat is as widespread as was feared. Bonia 
more, some recent studies have shown g 
asking subjects to role play in an piens 
produces the same outcome as Geceiving S 
jects about the experiment's true pd 
(Bem, 1967; Greenberg, 1967; Horowitz E 
Rothschild, 1970), a finding that has obvio! 
potential ethical relevance. . 

What is still missing from the literatur ‘6 
a review of this recent empirical research X 
the nature and prevalence of the Direkt 
validity that result from the way pet 
behave in experiments. This is the major I ie 
pose of the present discussion. Specifically, iy 
review explicates the roles that subjects jd i 
been assumed to adopt in experiments, ve A 
assesses the empirical support for a in 
Also, the antecedents of any biases ne 
subject artifacts are detailed. Finally. ped 
a discussion of how subject behavior d eria 
the internal and external validity of eXP 


ation 
-nerimentat! 
ments and of how laboratory ge np given 
can be improved, "Throughout, stress ol or 
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CONCEPTUALIZATIONS OF THE SUBJEC 
Subject’s Role ently 
The concept of “subject role” has fred" sub- 
been invoked to explain the behavior used it 
jects in experiments, Riecken (1962) 7 role- 
to compare the subject’s behavior to t jowe" 
related behavior of subordinates in a riti 
hierarchy, while Orne (1962) wrote: of 
the context of our culture the roles ers 
subject and experimenter are well I» ral rol? 
and carry with them well-defined mu v pus 
expectations [p. 777].” Four roles pie be 
far emerged from discussions of qr 004 
havior. We designate the roles as 3 ;yistit 
subject, the faithful subject, the neg 
subject, and the apprehensive subject- 


T 
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The essence of the good subject is that he 
attempts to give responses that, in his opinion, 
will validate an experimental hypothesis. The 
Concept of the good subject comes from Orne's 
(1962) seminal discussion of demand charac- 
teristics, where the concept is defined and its 
Motivational strength is alluded to: 


Admittedly, subjects are concerned about their per- 
Seals in terms of reinforcing their self-image; 
Onetheless, they seem even more concerned with the 
a, their performance, We might well expect 
ae at as far as the subject is able, he will behave 
Es hes context in a manner designed to 
e he role of a “good subject" or, in other words, 
validate the experimental hypothesis [p. 778]. 


As Orne made clear elsewhere, the motivation 
. playing the good subject role is the sub- 
Iect’s wish to provide data that will be of use 
" Science or to the experimenter, both of 
in are dependent on the subject for help. 
W Nether the subject can help depends on 
"hether he knows which responses would 
Eum a hypothesis. Thus, Orne's notion 
E ails both the motivation to perform the 
ie Subject role and the availability of in- 
*'pretable cues that allow the subject to infer 
(Sa he can perform the role adequately. The 
m Subject has been called the beneficient 
je Ject (Levy, 1967) and the cooperative sub- 
St (Sigall, Aronson, & Van Hoose, 1970). 
me tllenbaum (1966) used incidental state- 
and i by Orne (1962) and Riecken (1962) 
Subj us own research to coin the term faithful 
ie The faithful subject is someone who 
i that a high degree of docility is re- 
be “me in research settings and who further 
ee that his major concern should be to 
a Pulously follow experimental instructions 
to avoid acting on the basis of any 
Dose cions he might have about the true pur- 
tinct of a study, There are probably two dis- 
as sj Versions of the faithful subject role. The 
ive Re version assumes that subjects are rela- 
e x uninvolved in experiments and that 
tiom docilely and apathetically follow instruc- 
Te n he active version assumes that subjects 
the tivated to help science and that when 
ove, We extremely suspicious, they “will lean 
err a A ckwards to be honest . . . otherwise, 
Sine conclusions will be drawn [by the 
majo Meter] [Orne, 1962, p. 780].” The 

t importance of both versions of the 


Suspi 


[S] 
- 
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faithful subject role is that biased experi- 
mental performance should not result from 
playing the role. 

Just as the good subject is assumed to want 
to confirm a hypothesis, so the negativistic 
subject (Cook, Bean, Calder, Frey, Krovetz, 
& Reisman, 1970) is assumed to want to in- 
firm it by corroborating some hypothesis other 
than the experimenter's or by giving responses 
that are of no use to the experimenter. This 
orientation has also been called that of the 
recalcitrant subject (Fillenbaum & Frey, 
1970), and its effects have been called a 
“screw you effect" (Masling, 1966) or a 
“boomerang effect” (Silverman).* According 
to Masling, the motivation for this negativistic 
role is the subject’s displeasure at thinking 
that his behavior is being controlled by others. 
Argyris (1968) echoed this when he compared 
subjects to lower-level employees who some- 
times show covert hostility to their superiors. 
Both of these motivational approaches are 
similar to Brehm’s (1966) theory of psycho- 
logical reactance. 

The final role that subjects adopt has been 
commented upon by Riecken (1962) and 
especially by Rosenberg (1965, 1969). Rosen- 
berg postulated that subjects are sometimes 
apprehensive about how their performance will 
be used to evaluate their abilities or their 
socioemotional adjustment. Rosenberg further 
assumed that subjects are especially motivated 
to present themselves favorably to psycholo- 
gists who are expert evaluators of ability and 
adjustment. As a result, evaluation apprehen- 
sion might be easily aroused by experimenta- 
tion—by the mere thought of experimentation 
or by specific task cues in a laboratory setting. 
Furthermore, additional cues will often be 
available that can be used to infer which re- 
sponses will lead to the more positive evalua- 
tion. In a sense, the dynamics underlying the 
behavior of subjects who are apprehensive 
about how they will be evaluated are similar 
to the dynamics involved in giving socially de- 
sirable responses (Crowne & Marlowe, 1964) 


4], Silverman. Motives underlying the behavior of 
the subject in the psychological experiment. In W. 
E. Vinacke (Chm.), Ethical and methodological prob- 
lems in social psychological experiments. Symposium 
presented at the meeting of the American Psycho- 
logical Association, Chicago, September 1965. 
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or self-enhancing self-presentations (Goffman, 
1959; Jones, 1964). Even though Rosenberg 
never used the expressions himself, we hence- 
forth refer to the apprehensive subject and 
the apprehensive role. 


Process of Role Behavior in Experiments 


Each of these four conceptualizations im- 
plies a two-stage process, the first of which 
is the arousal of motivation to adopt a role, 
and the second of which is the perception of 
cues that guide behavior and make it con- 
gruent with the aroused motivation, The mo- 
tivation to adopt a role can be aroused by 
factors that antedate an experiment, for in- 
stance, by personality differences, prior ex- 
perience in experiments, gossip about experi- 
ments, the act of volunteering, or simply an 
awareness of the types of things that psy- 
chologists study. Motivation to adopt a role 
may also be aroused by cues within a par- 
ticular experiment. Thus, evaluation appre- 
hension might be high in a study of “normal 
psychological functioning,” or motivation to 
be a good subject might be strong if subjects 
know that the experimenter will be awarded 
his doctorate only if his hypothesis is con- 
firmed. Whenever such arousal cues are cor- 
related with experimental treatments, it will 
not be clear whether effects are due to treat- 
ments or to a particular subject motivation. 

Though cueing into a role may be necessary 
for adopting that role, bias can only result 
when subjects know what to do about their 
motivation. Such performance cues will nor- 
mally be found within the procedure of an ex- 
periment, and one set of cues that has re- 
ceived widespread attention is that which al- 
lows the subject to generate a hypothesis 
about a study. Orne (1962) has written: 


the totality of cues which convey an experimental 
hypothesis to the subject become significant de- 
terminants of subjects! behavior. We have labeled the 
Sum total of such cues as the demand characteristics 
0f the experimental situation [p. 779]. 


There will often be no cues to a hypothesis, 
either because the experimenter has no hy- 
Pothesis or because the Procedure has been 
designed so that none can be learned or be- 
Cause subjects may be faithful and will not 
ook for a hypothesis, However, even if there 
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are no cues to a hypothesis, there may be 
cues that are relevant to experimental per- 
formance. For example, “doing well" in verbal 
operant conditioning experiments can be de- 
fined as appearing intelligent and learning à 
reinforcement contingency. Thus, cues to à 
hypothesis or to performance levels may be 
powerful determinants of what subjects can 
do about a particular motivational set. 


Conceptual Problems Related to Subject Roles 


There are several problems associated with 
the hypothetical roles that subjects adopt in 
experiments. The first concerns the ambiguity 
of dependent variable measures for inferring 
the roles that have been adopted. For ex- 
ample, if subjects increase the frequency 0 
reinforced responses in a verbal conditioning 
study, they may do so because they want t0 
be evaluated as intelligent or because they 
want to be good subjects. In a poorly disguise 
attitude-change experiment, subjects who show 
no change may want either to appear inde- 
pendent or to be negativistic. And even ! 
subjects do change attitude, they may do P 
either because they want to be evaluated "i 
flexible and open or because they want to j 
good. Furthermore, if subjects in an exper 
mental condition that is designed to let ther 
adopt one of several roles perform in exact: 
the same way as "naive" control subjects, " n 
may be because they chose to perform fai i 
fully, or because they could do nothing abo 4 
their wish to be good or negativistic, "t 
cause the dependent variables were insens! pil 
to any differences in performance -—.. 
curred. Since the dependent variables 9 


m 
; iple i retati he outco! 
allow multiple interpretations of t poppe 


The construct of the apprehensive sub) 
is especially difficult to falsify. Often i 


clear exactly how an apprehensive m pet 
might be expected to respond. Thus, nake 


suasion setting mentioned earlier sen 
apprehensive subjects change attitu ng? 


: : 4 chat? 
senting themselves as flexible) or no epe!" 
attitude (presenting themselves as 19°" ge 


ati 

dent). This post hoc explanatory ! uatio” 
; 

lessens the predictive usefulness of Pruation* 
apprehension in some experimental 5! 
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Of course, there are other research settings in 
Which the socially desirable response is less 
ambiguous, for example, in conformity re- 
Search, problem-solving and ability tasks, 
measures of socioemotional health. 
d A Second problem is concerned with the 
Vw of these particular subject roles. 
Es of them is a potentially dangerous reifi- 
not be of a hypothetical construct that can- 
E directly manipulated or measured. In- 
mani» bn antecedents of roles have to be 
E in order to test the roles them- 
explic; Yet there has been to date no rigorous 
dm p. of such antecedents. Furthermore, 
their i the roles are not obviously distinct in 
fect Seo ple basis. Both the good sub- 
ta = the active version of the faithful sub- 
villin re roles that are based on the subject’s 
ut dien to help the experimenter or science. 
confirn one case the subject wants to help by 
a ming hypotheses, and in the other case 
Wants to help by being honest so that 


a é t so 
ture will be faithfully reflected in his re- 
POnse, : 


dira problem concerns generalizing from 
est g eg that were deliberately designed to 
Sones Ject roles to experiments that were de- 
avio to test more general theories of be- 
cedent, In testing for subject roles, the ante- 
Stron ^ of a role should be directly and 
er manipulated. But in most other re- 
Not pa’ an antecedent of a subject role would 
related the treatment—it would merely be cor- 
Ment Da some other theory-relevant treat- 
Droyiq n testing roles, the subject should be 
Sho ded with a hypothesis, or a hypothesis 
be easy to learn. But in most other re- 


Sear, 
Dagea’ hypotheses are deliberately camou- 
Dothest Furthermore, in testing roles, the hy- 


Mentere : normally a simple one, and experi- 
Mplici ypically specify, either explicitly or 
for citly, the direction of the subject’s per- 
S je de (“You should change attitude," or 
| Should complete more than 10 items 

the pattie, etc.). But in most other research, 
One ees is a complex and differentiated 
Con, 2t might even involve an interaction. 
Plying with a differentiated hypothesis is 
sup; Plex process which requires that the 
Me a knows, first, what the other experi- 
Conditions are; second, how persons in 
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each of these conditions are supposed to be- 
have; and third, how he can calibrate his hy- 
pothesis-confirming behavior so that he regis- 
ters in responses the hypothesis that he learns 
in words. These differences suggest that ex- 
periments on subject roles, even though they 
involve target settings (the laboratory) and 
target populations (college sophomores, usu- 
ally) do not involve target treatments (treat- 
ments correlated with a subject artifact) and 
target procedures (with differentiated hypoth- 
eses that are difficult to learn). As a con- 
sequence, we cannot safely generalize from 
experiments designed to test subject artifacts 
to normal theory-testing research. 

Whether the subject roles are descriptive or 
explanatory concepts is a fourth problem. The 
faithful role, good role, and negativistic role 
at a descriptive level indicate that subjects 
make responses which do not affect, falsely 
confirm, or falsely infirm the researcher’s pre- 
dictions. However, various motivations might 
lead to the response characteristic of any one 
role. The apprehensive role, on the other 
hand, specifies the subject’s motivation, but 
does not describe responses with reference to 
the researcher’s hypothesis. In the literature 
to date, these subject roles have been used, 
unfortunately, to refer to motivations, or 


responses, or both. 


Utility of Subject Roles 

The above are serious problems of defini- 
tion, falsification, generalizability, and level 
of analysis. Nonetheless, there is some utility 
to examining hypothetical subject roles. The 
falsification problem is less serious than it 
appears, for the pattern of findings can be ex- 
amined across many imperfect experiments 
with different weaknesses. Tf all of these cor- 
roborate just one or two roles, then inferences 
about roles are possible (Campbell & Stanley, 
1963). Moreover, the problem of external va- 
lidity is one that is best broached once it has 
been rigorously established that subjects do 
adopt particular roles. The best designs to 
answer this last question will normally be de- 
signs that have low external validity but high 
power to falsify. Finally, the definitional 
weaknesses are probably a reflection of the 
dearth, until the last 3 years, of data about 
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subject performance. Precise definitions can be 
more reasonably expected as responses to data 
than as elaborate a priori theory building. 
There is considerable practical advantage to 
be gained from establishing whether subjects 
regularly seek to confirm hypotheses, to in- 
firm them, or to disregard them, for each of 
these orientations describes a different kind of 
performance with respect to an experimenter’s 
hypothesis. Experimenters would be able to 
conclude from the roles whether their hypothe- 
ses are likely to be confirmed for the wrong 
reasons (the good subject), or whether they 
are likely to be inadequately tested (the nega- 
tivistic subject), or whether they will not be 
affected by subject behavior (the faithful sub- 
ject). This obvious utility militates against 
extreme reductionism, for while it might be 
Possible to reduce all subject behavior to 
evaluation apprehension by asking, “Why does 
the subject want to be good /negativistic/ 
faithful?” this would lose the power to relate 
subject artifacts to consistent, directional dif- 
ferences on dependent vari 
experimenters, 
Scientists interested 


ables of concern to 


in subject behavior 
have typically used the good, negativistic, 


faithful, and apprehensive roles to explain 
how subjects behave in experiments, By re- 
taining the same role constructs, it should be 
possible to evaluate which of these hypotheti- 
Cal roles seem best supported empirically. 
This extant literature and theorizing ought to 
be examined on its own level of analysis to 
see if the roles can account for the recent em- 
pirical findings, Tf they cannot, however, new 
Constructs need to be developed that account 
for the findings to be reviewed here, 


erview data are 
data could give important 


t mediating processes, But 
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subject behavior present behavioral data hU 
cast doubt on the validity of their own inter- 
view data (Golding & Lichtenstein, 1970] 
Levy, 1967), and interviews and been 
naires are liable to the very artifacts they ar 
meant to uncover. 

Orne's important concept of demand ke 
acteristics requires learning a hypothesis, E 
does the negativistic subject role. - 
studies are classified according to whether b 
not experimenters gave subjects an expe É 
mental hypothesis to work on. Being DAE 
hypothesis permits a strong test of role ie. 
havior (if it can be assumed that dnt: aie 
cept the hypothesis), since all subjects S 
sufficient information to act upon their kr 
tivation. Such information is less readily e— 
able in the more representative instance ie 3 
subjects have to penetrate the arrangemen "e 
an experiment and have to generate a hyp 
esis for themselves. 


is is Given 
Studies Where the Hypothesis is Giv 


is 

The most direct test of subject roles 
made by informing subjects of a hypot? ? 
This has been done in seven studies. ig 

(1967) used a verbal conditioning par i 
in which a confederate did or did not exP!® 


Fave more rewarded responses ( r su 
But the shape of the learning curve A icd 
jects who knew the hypothesis was ! ( h 
to that for presumably naive subjects dition 
ful subject), However, since verbal con 5 
ing tasks can be easily interpreted nar res 
jects as problem-solving tasks, and e 
sponding to the reinforcement inet ject? 
Probably demonstrates intelligence, ae 
Who showed conditioning may have pe ject): 
appear competent (apprehensive oe 3 
Levy’s study highlights, first, that eC 
produced by knowing a hypothesis ba a on 
ond, that if we considered this pp Bia 
the bias would be uninterpretable in 
subject roles, ed rol 
Horowitz and Rothschild (1970) n on 
play in an Asch-type conformity stu lay I 
ETOUD of subjects was simply asked a pr 
role of Subjects in an experiment, nen We 
“nature and Purpose of the um ple! 
fully disclosed [p. 225]” to another " 


Ve 
l 
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ing group. A third group was deceived in fairly 
typical fashion. Subjects with the simple role- 
playing set produced data equivalent to the 
deceived group, while subjects who had full 
knowledge of the hypothesis conformed less 
than the other groups. The findings allow only 
two interpretations: that subjects with com- 
plete knowledge of the hypothesis presented 
themselves in the socially desirable way as 
independent and nonconforming, or they were 
Negativistic. The data clearly rule out a good 
Subject explanation, which predicts high con- 
formity among subjects who knew the hy- 
Pothesis, as well as a faithful subject inter- 
Pretation, which predicts no differences in con- 
formity, 

Brehm and Krasin (reported in Brehm, 
1966) tested a reactance theory prediction by 
Manipulating whether subjects’ freedom to 
orm their own opinions was threatened. Al- 
though the researchers suggest that their 
Study was about attitude change, we prefer to 
Mterpret it in terms of compliance, for no 
Persuasive message was involved and subjects 
Were exposed to the responses of another stu- 
ent on the dependent variable measure of 
Attitude, In the high-reactance conditions, sub- 
Jects were instructed, “We are sure you will 
* greatly influenced by the opinions stated 
E the other students|, and that your an- 
stide, this time will tend toward those of this 
a ent [p. 107].” This manipulation clearly 
— subjects to learn the presumed hypoth- 
IS (though not the experimenter’s true re- 
pire hypothesis), and subjects in the high- 
» Ctance condition changed opinion Tess than 
‘jects in the low-reactance condition. This 
®Creased compliance when the presumed hy- 
Ten esis is known may be interpreted either as 

Bativism Or as evaluation apprehension. : 
s, tafson and Orne (1965) used galvanic 
en oe (GSR) measures to assess the 

of different demand characteristics upon 
8 Partly autonomic response. One group of 
a Jects Was given the hypothesis that mature 
Intelligent persons would be able to de- 


Cej: i 
Be. e experimenter (the need-to-deceive 
hypo)» and another group was given the 


ease pt esis that only psychopaths could suc- 
dition Y deceive (the need-to-be-detected con- 
- Subjects then received feedback about 


Wh 
et] : A 
ner they were successful in deceiving or 


being detected. Skin resistance increased when 
subjects in the need-to-deceive condition 
learned they had not been successful and 
when subjects in the need-to-be-detected con- 
dition learned they had not been detected. 
Perhaps evaluation apprehension is increased 
when subjects know a hypothesis and fail to 
produce any selí-enhancing responses that it 
suggests. Explaining the GSR data in terms of 
other roles would require postulating that sub- 
jects had a great deal of control over their 
level of skin resistance. 

Sigall et al. (1970) performed a crucial 
study in which subjects were told different 
hypotheses about how many telephone num- 
bers they were expected to copy out of a 
book. In one condition, subjects were told that 
a high number was expected; in a second con- 
dition, subjects were told that a low number 
was expected; and in the third condition, sub- 
jects were told that they were expected to 
copy few numbers and that to copy many 
numbers would indicate obsessive-compulsive- 
ness. Subjects in the first and third conditions 
confirmed the hypothesis. But, in the second 
condition, where confirming the hypothesis 
conflicted with individual competency (evalua- 
tion apprehension), subjects preferred to copy 
many numbers, thereby giving the self-pre- 
senting and not the hypothesis-confirming re- 
sponse. Sigall et al. explained this behavior in 
terms of the apprehensive subject, and their 
study is unique in allowing only one inter- 
pretation in the condition where the good and 
the apprehensive responses were pitted against 
each other. The other two experimental condi- 
tions do not permit distinguishing between the 
good and apprehensive roles. 

Rosenberg (1969) conducted a series of 
studies to demonstrate the existence of evalua- 
tion apprehension and to illustrate the condi- 
tions under which it operates. Part of the 
procedure of these experiments involves in- 
forming subjects of response norms that act 
as obvious demand characteristics and so allow 
them to form a hypothesis about how they 
should behave. Using a variety of experimental 
tasks, Rosenberg showed that subjects respond 
in accordance with experimentally induced 
norms of psychological health, maturity, and 
intelligence. Moreover, individuals who score 
high on the Social Desirability scale exhibit 
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stronger effects, especially when a setting is 
mildly clinical or an experimenter controls 
valued resources. 

Golding and Lichtenstein (1970) used 
Valins’ (1966) bogus heart-rate procedure and 
found no significant differences in liking for 
picture stimuli between subjects who were 
naive, suspicious, or fully informed of a hy- 
pothesis. Golding and Lichtenstein placed little 
emphasis on this outcome and concentrated on 
the data from postexperimental interviews. It 
is difficult to interpret their no-difference find- 
ing. First, the data may reflect an unsuccessful 
manipulation. Second, all subjects in all con- 
ditions may have learned the hypothesis (post- 
experimental interview data on awareness sup- 
port this interpretation). And third, the mea- 
Sure of liking may not have been sensitive 
enough, probably because of a ceiling effect 
and high error variance, to discriminate small 
differences between conditions. 

Apart from the Golding and Lichtenstein 
study, a clear picture emerges of what sub- 
jects do when they know a hypothesis: They 
use it to determine performance, and bias is 
produced. This bias was often in the direction 
of the experimental hypothesis. But, in com- 
pliance or conformity studies and in the 
Sigall et al. (1970) study where the good and 
apprehensive roles are not confounded, the 
good response was not given, and the appre- 
hensive response was. 


Studies Where the Hypothesis Was Not 
Given to Subjects 


Subject artifacts can potentially occur be- 
Cause subjects in some conditions are more 
likely than subjects in other conditions to 
generate a hypothesis for themselves. These 
Studies fall into five major categories based 
9n experimental procedure: Studies of condi- 
tioning, of conformity and compliance, of at- 


titude change, of incidental learning, and of 
Sensory deprivation. 


Conditioning Tasks 


Subject artifacts may 


be responsible for 
phenomena that have be k 


en demonstrated in 


x conditioning experiments, Page (Page, 
8, 1969, 1970: Page & Lumia, 1968) tried 
9 demonstrate this in hi 


$replications of studies 


- : Tod 
Classica] Conditioning, verbal operant condi- 
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tioning, and reinforcement-correlated modi 
tion of figure-ground perceptions. Page s "e 
and replicated finding was that ene M 
reported awareness of the reinforcemen d be- 
tingencies emitted more of the reinforce! uch 
havior than subjects who reported. no “tte 
awareness. Page interpreted this win 
of the good subject. A small proportion 0 M 
jects reported awareness and failed to em! i: 
reinforced behavior, and this Page interpa cer- 
as negativism. These interpretations a atible 
tainly tenable. But they are based on ee 
postexperimental interviews about awa! ehen- 
and an interpretation based on the RAE. 
sive subject cannot be ruled out. TS 
bers of subjects may have wanted to "— 
strate their insight into the reinforcemen ad 
tingencies by emitting the reinforced pd 
and a smaller number may have t: rein- 
appear independent by not giving the 
forced response. 

The same confound of the good 
prehensive roles occur in the work of 
A first study demonstrated that subje 
showed faster verbal conditioning were 
who had been in more experiments b» 
1967). A second experiment demonska a ex 
subjects who had experienced a posit g 
perimental history of three interesti e re 
periments conditioned faster repare 
awareness of the reinforcement contine 
and volunteered for more future pnt ad 
than did control subjects or subjects ul and 
experienced a negative history of bor" el- 
useless experimental tasks (Holmes p stinë 
baum, 1970). However, the three in cd jon 
tasks involved the Thematic Len p 
Test (TAT), the Minnesota Multiphe ave? 
sonality Inventory (MMPI), and the 
Progressive Matrices intelligence acd 
personal adjustment tests may d » 
Subjects in the positive history tion 1 
self-conscious about their presenta ime 
the criterion verbal conditioning ait b "i 
Thus, the greater conditioning may the fa 
due, not to cooperativeness, but t° ighe" i 
that evaluation apprehension was P. 
the positive history condition tha" 
negative history condition. „atic? 

Tt seems, then, that bias is syste" ing 

B 1 know! 9 ge 
found among subjects who report jous 


u 
À E vio 
Contingencies in experiments with © 


and aP” 
Holmes: 
cts wh? 
those 


olmes, 
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pcies: 
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inforcement tasks. Most subjects emit the re- 
inforced response, but it is not clear whether 
they are good or apprehensive. A very few 
subjects consistently report knowing the rein- 
forcement contingency and do not emit the 
reinforced response. This could be interpreted 
as negativism, though an apprehensiveness in- 
terpretation cannot be completely ruled out. 


Conformity and Compliance Tasks 


Horowitz and Rothschild (1970) found that 
Conformity was lowest of all among subjects 
Who were informed of a conformity hypothesis. 
The same relationship was found by Stricker, 
Messick, and Jackson (1967) in a study that 
Involved three sessions. A battery of person- 
ality tests was given in the first; subjects ex- 
Perienced a modified Asch-type conformity 
€xperiment in the second; and the third ses- 
Slon was a paper-and-pencil conformity task, 
and suspiciousness about the experimenter’s 

YPothesis was measured. Suspiciousness and 
Conformity were negatively related for both 
conformity tasks. If it can be assumed that 
Subjects who reported suspiciousness about the 
€xperimenter’s hypothesis also learned his 
lypothesis, and if it can be assumed that the 
Negative correlations are not artifacts of some 
altselection factor, then the Stricker et al. 
E are similar to those from conformity 
Udies where subjects were informed of a 
'YDothesis, The data are indicative of a nega- 
^ fie or an apprehensive subject, but not 
a good or faithful subject. 
The Horowitz and Rothschild (1970) study 
ad a condition where subjects were informed 
Be Ust gf a hypothesis nor of deception. They 
Er Simply asked to role play being in an 
Xperiment, Interestingly enough, there was no 
ge erence in conformity between the role-play 
Stana and a group that was deceived in 
hoe fashion, and each of these groups 
Hea ne more than subjects who were ex- 
as tly told a hypothesis. There are two ap- 

ent explanations for the absence of bias 
sum’ role-play subjects. One is that such 

Jects simply could not learn a hypothesis, 

another is that the role-play instructions 
studs Salient the faithful subject role, and so 
Upon nts refrained from searching for or acting 
ate ca hypothesis, These related possibilities 

50 confounded in the other role-play ex- 
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periments that have indicated no performance 
differences between role-play subjects and con- 
ventionally deceived subjects (Bem, 1967; 
Darroch & Steiner, 1970; Greenberg, 1967). 

A study of role playing and conformity by 
Willis and Willis (1970) further emphasizes 
the importance of the role-play instructions. 
In this investigation, the hypothesis was not 
given to role-play subjects, but the deceptions 
in the experiment were explained to them, and 
they were asked to play the role of naive sub- 
jects. Role-playing subjects reproduced the 
main effects in the experiment, but an im- 
portant interaction, which the deceived sub- 
jects produced, did not appear among role- 
play subjects. The absence of an interaction 
makes it difficult to infer support for any one 
subject role. Nevertheless, the failure to 
replicate behavioral findings warns against the 
casual use of role play as an alternative to 
deception and intimates that if role play must 
be used, information given to subjects about 
procedures should be limited. 

Brock and Becker (1966) investigated the 
effects of deception and debriefing in a com- 
pliance study. They used two immediately 
consecutive experiments. In the first, all sub- 
jects were deceived and were then massively 
debriefed, or partially debriefed, or not de- 
briefed. In the second, subjects caused high 
or low damage to a machine, and there were 
or were not common cues which linked the 
two experiments. After the second experiment, 
subjects were asked to sign a strongly counter- 
attitudinal petition. Only persons in the high- 
damage conditions signed, and partitioning x° 
showed that neither prior deception nor com- 
mon cues affected performance among such 
persons.? Tf it is assumed that subjects did 


5 Brock and Becker (1966) computed Fisher exact 
probability tests within the high-damage condition 
in order to examine differences in petition signing due 
to debriefing and common cues. Neither of these 
variables affected petition signing as shown by parti- 
tioning of x^. Brock and Becker reported that all 
conditions differed from the massive-debriefing-com- 
mon-cues condition with p values from .01 to .10. 
However, these values were recomputed by the 
present authors, and it appears likely that Brock and 
Becker used inappropriate one-tailed tests. Using a 
two-tailed test and a — 05 showed that only 1 of 5 
possible comparisons was statistically significant. The 
low signing in the massive-debriefing-common-cues 
condition may therefore be due to chance. 
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not understand the relationship between the 
earlier experiments and petition signing, then 
Brock and Becker's finding is like the finding 
from conformity studies: Performance is not 
affected when a hypothesis is not learned. 
However, it is not clear from the Brock and 
Becker study whether subjects simply failed 
to learn a hypothesis or whether, when partial 
or massive debriefing were linked to common 
cues, the common cues served to make salient 
the faithful subject role. 

It seems, then, that subjects conform less 
when they learn a hypothesis in a conformity 
study, and this outcome can be interpreted in 
terms of the apprehensive or negativistic sub- 
ject, but not in terms of the good or faithful 
subject. However, no bias is found when the 
hypothesis is not learned and when cues to de- 
ception or instructions to role play are made 
available to subjects. It is not yet clear 
Whether the absence of bias results from sim- 
ple inability to learn a hypothesis or from 
subjects’ adopting a faithful subject role that 
impels them not to search for or act upon a 
hypothesis, 


Attitude Change Studies 


There have been a number of persuasion 
studies where subjects were not provided with 
a hypothesis and where increased persuasi- 
bility was found in conditions designed to 
elicit subject artifacts. Silverman found in- 


creased persuasion where subjects were told 
they were in an exp 
being ignorant about 
Stronger among 
Occurred only 


. He also found 
in à distraction 


lowed a prior d 


€ception experience (Silverm n 
Shulman, & Wi ud 


1970). These out- 


] 1910), Silverman Seems to prefer 


etation over à good 
that Subjects want to 
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present themselves as open and flexible. Eu 
there are no data that unambiguously support 
is interpretation. 
E e al. (1970) manipulated Me 
subjects took part in four deception studies A 
none before participating in a test experime 
on attitude change. The experiments were E 
separated by weekly intervals. This history F 
prior deceptions had no effect on pan 
bility, and it is possible that the Biasindur a 
effects of prior deception (Cook et al., vnb 
| Metaexperiment IT]; Silverman et al., 1 2 
will be limited to immediately nee 2 
periments. Alternatively, the rather dul ks 
periments of Cook et al. may have made a 
jects blasé and submissive, and so they Me 
not have actively searched for any hypot 
or performance-directing cues. i 
Another set of studies has shown tha 


6 
pe. jects at 
measurable bias is found when subjec ately 
ately 


no 


were not functionally equivalent. 
when a cue to deception was provided ), no 
a persuasion study (the test experimen’? - 
bias resulted and the difference in au 
tween experiencing a deception and lea 
about it disappeared. Cook and Perrin a 
partially replicated this finding using ‘i : 
learning as the dependent variable at 
the introduction to the study as the ^ host 
deception (“This is one of those studies yo! 
irue purpose just cannot be reveale epti 
right now"), The fact that cues to a they 
did not cause bias, and the fact t «ti 
eliminated the bias due to subject 2 activ 
may indicate that subjects adopted i so; ^ 
version of the faithful subject role. cue 19 
cue to deception may merely be ? p 
that role. ere 

A final attitude change study is Rota of 
(1965) well-known experiment on E was de 
payments in reducing dissonance. : jonshiP 
Signed to show that the inverse rela inc? 
between attitude change and EUMD cat 
tive that was found by Festinger Bren k 
smith (1959) and by Cohen (in Ph tive 
Cohen, 1962) was due to the high Í 


ing 


sad 
ising 


aS. 


t um 
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making subjects apprehensive about whether 
the experimenter would see them as persons 
Who had been “bribed” to change their at- 
litude, Rosenberg modified Cohen’s study by 
dissociating the counterattitudinal advocacy 
from the measurement of final attitude, and 
he found a positive relationship between in- 
Centive level and the amount of change. Al- 
though later studies (Carlsmith, Collins, & 
elmreich, 1966; Cook, 1969) have avoided 
the above measurement problem and have re- 
Supported the inverse relationship, Rosen- 
erg’s study does make the important point 
that certain independent variables, like pay- 
Ment, may arouse evaluation apprehension 
ifferentially across experimental conditions. 
Reviewing the attitude change studies on 
Subject artifacts presents a less clear picture 
than that for conditioning or conformity 
iude. Bias is found and is typically in the 
rection of increased attitude change. Whether 
this increase results from subjects being ap- 
Prehensive or good is not yet clear, though 
s Interaction of anonymity and of knowing 
aw Is an experiment (Silverman, 1968) in- 
„rectly supports an apprehensiveness explana- 
ot Over a goodness explanation. But ap- 
crchensiveness can presumably lead to de- 
oon attitude change in those situations 
their Subjects feel decreased change to be in 
self-interest. No such conditions have 
i far been convincingly demonstrated." 
andi there is even indirect evidence that 
jects provide faithful data if they have 
Participated in several experiments that were 
Paced over several days or if a cue to decep- 
E 15 openly provided immediately before an 
Xperiment, 


Inci, 
"dental Learning Studies 


suillenbaum has conducted a number of 
‘dies on the effects of chronic and acute 


» " 
reetja ation apprehension does not predict the faa 
Socially of bias, but simply its congruence with e 
in a Y desirable response, and so it is possible that 
Suasip n€ situations subjects may show decreased per- 
th PUity, Silverman and Shulman (1970) proposed 
(Sily, Ecreased persuasibility does occur, and they 
hung "án & Shulman, 1969) have reported that 
Suasion female subjects will show resistance to per- 
Mdicate Nonetheless, this review of persuasion studies 


Derg.) that when bias occurs, it generally enhances 
“asibility, š 


"d 


suspiciousness of an experimenter's cover 
story. He manipulated acute suspiciousness by 
deceiving subjects in an experiment that im- 
mediately preceded a test experiment (Fillen- 
baum, 1966), and he studied chronic suspi- 
ciousness by choosing two groups of subjects 
who differed in their reported suspiciousness 
about the rationale that experimenters give 
for their experiments (Fillenbaum & Frey, 
1970). The test experiment in Fillenbaum's 
work has been a word-canceling task, and the 
dependent variable has been incidental learn- 
ing of the message from which words were 
canceled. In no study did suspicious subjects 
significantly differ from nonsuspicious subjects 
in the amount learned, and Fillenbaum in- 
terpreted this in terms of the faithful subject. 
It should be pointed out, however, that suspi- 
cious subjects tended to learn more than non- 
suspicious subjects in all four experiments con- 
ducted by Fillenbaum (and in the study by 
Cook & Perrin, 1971), and that the combined 
probability of this trend across all of Fillen- 
baum's experiments might be statistically sig- 
nificant if it could be computed. But even if it 
were significant, it would merely indicate a 
consistent effect of trivial proportions. 

What is not yet clear from the studies of 
incidental learning is whether the mundane 
experimental task made all the subjects in- 
different about their performance or whether 
suspicious subjects became actively faithful 
and refrained from acting upon their suspi- 
cions. The word-canceling task makes it 
likely that suspicious subjects could formulate 
hypotheses about the study in terms of the 
message content, and so, either directly or in- 
directly, they could learn it better (good sub- 
jects) or worse (negativistic subjects) if they 
wanted to. Hence, the task is probably sensi- 
tive to bias, but none has yet appeared. Per- 
haps an experiment by Minor (1970) using 
Rosenthal's picture-judging task is relevant 
here, for Minor found that experimenter ex- 
pectancies had no effect on picture ratings 
when evaluation apprehension was low and 
that they had an effect when evaluation ap- 
prehension was high. Apparently, high evalua- 
tion apprehension can motivate subjects to 
find and act upon incidental cues in an ex- 
periment. In Fillenbaum's studies, evaluation 
apprehension was probably low, and this might 
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explain why subjects did not act upon their 
suspicions. 


Sensory Deprivation Study 


Orne and Scheibe (1964) conducted a study 
to elucidate the operation of demand charac- 
teristics. In the experimental condition, sub- 
jects were socially isolated aíter first being 
exposed to the medical paraphernalia and the 
panic button that typically accompany sensory 
deprivation studies. In the control condition, 
subjects were isolated without exposure to 
high demand characteristics. The experimental 
group reported more unusual experiences dur- 
ing isolation, and they tended to perform less 
well in motor performance tasks that followed 
isolation. It is difficult to infer which subject 
role best accounts for these differences, though 
it is less difficult to infer that experimental de- 
tails which are incidental parts of a treatment 
may sometimes be the true causes of effects 
that are attributed to the conceptually crucial 
parts of the treatment. It may be that no 
subject orientation at all is required to ex- 
plain the findings of Orne and Scheibe. They 
may have resulted simply from the high anx- 
lety aroused by the medical apparatus com- 
bined with social isolation. 


IMPLICATIONS FOR UNDERSTANDING SUBJECT 
BEHAVIOR IN EXPERIMENTS 


The conclusions from the literature review 
can be summarized as six statements about 
hypothetical subject roles, two about experi- 
mental task characteristics, and two about 
subjects’ experimental history. While these 
conclusions are well supported by the litera- 
ture, some are tentative in the sense that the 


crucial direct empirical tests must still be 
performed. 


Conclusion 1: There is widespread evidence 
for the apprehensive subject 


Tn every study reviewed, biased performance 
could be interpreted as the result of evalua- 
tion apprehension, Such strong support would 
be trivial if the support merely reflected the 
Een definitional elasticity. Fortunately, 

e Construct can be put to approximate test 

Y considering the two experimental para- 
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digms where the apprehensive response is rela- 
tively clear-cut and directional. In conformity 
studies, the apprehensive subject would p - 
expected to make himself look good by m 
conforming. In conditioning studies, apparent Y 
perceived as problem-solving or perform 
tasks, the apprehensive subject would be E. 
pected to perform well and condition We 
quickly. In the several conformity studies i 
viewed here where subjects knew or ee. 
the hypothesis, they conformed less than hi ? 
jects who did not know the hypothesis; aa 
in the several conditioning studies ei. 
here, “aware” subjects tended to show or at 
conditioning. What makes this pae 
stronger is that less conformity in pm E 
situations can be interpreted only in e SR 
the negativistic or apprehensive m 
whereas greater conditioning in pape 
situations can be explained only in we 
the good or apprehensive subject. Thus, aC 
the two paradigms (comprising man 
parsimony makes a strong case for: the < hen- 


sion. Nonetheless, the many 
presentation available to the apP " 
subject seriously limit the predictive mental 
ness of the construct in other exper ent? 
situations. It is crucial that the anie on- 
and outcomes of this vague hypother™ renta! 
struct be specified for other erpe pe 
paradigms if evaluation apprehension subje! 
a viable approach to understanding 
behavior. 


rehe 
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evaluation apprehension is absen to note v 
across conditions. Tt is important * iste 
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of a good subject role. It qt yd 
that, contrary to some recent 0 
(Rosnow, 1970), the construct 
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Conclusion 3: When the good and the ap- 
prehensive roles are pitted against each other, 
subjects prefer responses indicative of the 
apprehensive subject 

Evidence for this comes from a study where 
the two roles were deliberately pitted against 
each other, and subjects acted to present them- 
Selves positively (Sigall et al., 1970). Less 
Strong evidence comes from the conformity 
Studies in which subjects knew a hypothesis. 
These subjects were faced with the dilemma 
either of confirming the hypothesis and ap- 
Pearing nonindependent or of acting contrary 
to the hypothesis. This last response was the 
9nly one obtained. 


Conclusion 4: Evidence jor the negativistic 
Subject role is consistently confounded with 
evaluation apprehension 


The strongest evidence for negativism comes 
from conformity studies and from the minor- 
ity of subjects in conditioning experiments 
Who reported awareness of a reinforcement 
Contingency but did not perform the rein- 
forced behavior. But evaluation apprehension 
Can explain both of these sets of data. It is 
Derhaps surprising that little evidence of nega- 
tivism was found in persuasion studies and in 
Settings where subjects knew a hypothesis, for 
in both of these situations, subjects may have 
felt that the experimenter was attempting to 
Control their behavior, a feeling which should 
Mediate negativism (Masling, 1966). The 
Paucity of evidence for negativism suggests 
that Conclusion 4 is conservatively worded, 
"ut until a situation is tested where nega- 
tivism and evaluation apprehension can be 
“nconfounded, the conservative conclusion 
ust stand. 


Conclusion 5: There is evidence that in cer- 
tain restricted contexts a faithful. subject role 
may be adopted 

The faithful subject role is more difficult to 
“st than other roles, and conclusions about 
fa; must be cautious ones. But apparently 
aithful behavior has been found in two situa- 
lons. The first situation is when subjects do 
9t know a hypothesis, when experimental 
T ism is low, ‘and when the test experiment 
oes not arouse evaluation apprehension. More 


specifically, faithfulness may be found after 
a series of spaced deception studies (Cook et 
al., 1970), or after a series of spaced experi- 
ments that are dull (Holmes & Appelbaum, 
1970), or following prior deception in an 
incidental learning task (Fillenbaum, 1966). 
In these cases, subjects were probably passive 
and submissively followed instructions. The 
second situation is when subjects are spe- 
cifically asked to role play and do not know 
a hypothesis (Bem, 1967; Greenberg, 1967; 
Horowitz & Rothschild, 1970), when subjects 
hear an obvious cue to deception in an ex- 
periment that immediately follows a prior de- 
ception (Brock & Becker, 1966; Cook et al., 
1970), and when a simple persuasion study is 
introduced as “one of those studies whose 
true purpose cannot be revealed to you right 
now" (Cook & Perrin, 1971). Subjects in 
these situations may be cued into the active 
faithful subject role. 


Conclusion 6: Subjects will be faithful only if 
no hypothesis has been learned and evaluation 
apprehension is relatively low 


In the six studies where subjects were given 
the hypothesis, this knowledge typically 
caused bias and so subjects could not have 
been faithful. Moreover, the faithful response 
is pitted against a biased response in each 
study that was reviewed here, and since most 
of the bias can be explained by evaluation 
apprehension, such apprehension may be pre- 
potent over the faithful role in many in- 
stances. 


Conclusion 7: Bias results from learning a 
hypothesis 


Evidence for this conclusion comes from 
studies in which subjects were or were not 
given a simple hypothesis. In most cases, 
hypothesis learning led to differences in task 
performance. (Editorial selectivity may in- 
flate this relationship, of course.) Indirect 
support for the conclusion comes from studies 
where interview or questionnaire data indi- 
cated knowledge of a hypothesis. Such knowl- 
edge was related to bias in one conformity 
study (Stricker et al., 1967) and among most 
subjects in conditioning experiments who were 
aware of the reinforcement contingencies. 
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Conclusion 8: Bias is systematically related to 
experimental paradigms 


In conditioning studies, bias was generally 
associated with high conditioning; in com- 
pliance and conformity studies, bias was as- 
sociated with low conformity; in persuasion 
studies, there was a trend (which needs fur- 
ther study) for bias to be associated with high 
persuasibility. There may be particular ma- 
nipulations of evaluation apprehension which 
lead to a low conditioning, high conformity, 
or low persuasion. But it is noteworthy that 
normal research within these paradigms is 
probably open to bias that operates in only 
one direction. The unidirectionality of bias 
might be less troublesome in studies of com- 
pliance than in studies of persuasion and con- 
ditioning since in these last instances bias 


operates in the same direction as expected 
findings. 


Conclusion 9: Prior experience of deceptions 
is not strongly related to bias 


There have been several experiments in 
which a prior deception and debriefing have 
immediately preceded a test experiment. In 
some cases the prior deception did not affect 
experimental performance (Brock & Becker, 
1966; Fillenbaum, 1966) or only affected it 
weakly in interactive contexts (Cook et al., 
1970; Cook & Perrin, 1971; Silverman et al., 
1970). Moreover, in one study, a series of five 
deceptions and debriefings spaced over 5 weeks 
failed to affect either attitude, incidental learn- 
ing, or task performance (Cook et al., 1970) 
The effects of prior deception seem to | 


prisingly few in number and 
nitude, 


be sur- 
small in mag- 


Conclusion 10: Effects of prior participation 
mM experiments have been demonstrated only 
n conditioning studies 
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phistication seems to enhance conditioning 
(Holmes, 1967; Holmes & Appelbaum, d. 
Page, 1968, 1969), provided that the "p 
mental history is not a dull one. It is not ye 
clear whether these effects are local to gondi 
tioning studies. Nor is it clear whether ma 
are mediated by motivational forces amela 
with experience in experiments or by b 
knowledge of psychology that comes from qe 
ticipating in experiments or from taking 8s] 
periments late in the semester. Page (19 i 
did control for knowledge of poyelialey 
through course work, and he found bias. T c 
tentatively supports a motivational posae a 
over a cognitive one. However, thie: nate 
approach seems “obvious” for explaining mm 
a relationship between sophistication and p! e 
formance should be limited to condition ® 
studies, for conditioning is one area of m 
chology that is extensively covered in pes ich 
troductory psychology courses from ru 
most subjects come. Indeed, Page's (1 E 
latest findings on subject sophistication die 
operant conditioning support a cognitive 
terpretation. been 

Some of the present conclusions have tdi 
presaged by other writers. Thus, withou be 
viewing all of the relevant literature, Aron? 
and Carlsmith (1968) concluded: 


i . a f evide 
cal experimentation, there is a good deal ol rA 
that subjects attempt to put themselves in 
possible light [p. 62]. 


nes 
Moreover, it is interesting to note how ppt 
definition of the good subject has change¢ wood 
time. In his famous 1962 statement, the mpt 
subject was defined as someone who n f 
to validate the experimental hypothe the 
phrase that was even italicized. By Te imila" 
definition had become broader and is abject 
to the definition of the apprehensive 5H 


. 10 
ten eit 
To be a good subject may mean many ua of 
give the right responses, i.e., to give the give 
response characteristic of intelligent subjects} ay 4 
the normal response, i.e, characteristic other jndi- 
jects; to give a response in keeping with 69, P 
vidual’s self-perception, etc, etc. [Orne, 
145]. 70) 
19 
Furthermore, Silverman and Shulman ( ap 


A z tio 
pointed to the predominance of evalua 


' 
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prehension in mediating artifacts in attitude 
Change studies, and they even noted that when 
evaluation apprehension and being a good sub- 
Ject are pitted against each other, responses 
Indicative of apprehensiveness are emitted. 


IMPLICATIONS ror IMPROVING RESEARCH ON 
SUBJECTS’ BEHAVIOR IN LABORATORY 
EXPERIMENTS 


Empirical research on subject behavior in 
experiments has a short history, and there are 
e considerable conceptual and methodo- 
ogical improvements to be made. One con- 
pal improvement would be to fit research 
^" nw roles into better developed theories 

human behavior. Negativism is akin to 
pus (1966) concept of psychological re- 
a Ti and, as Rosnow (1970) and Silverman 
ee +, (1970) have pointed out, negativism 
E also be explained in terms of frustration 
feet aggression, Tt may be that the strongest 
ma of whether subjects act negativistically, 
E when they do so, will require the 
wee ation of antecedents of reactance in 
E ecd or of frustration in general. The 
Nin mo can be interpreted as someone 
is a elps the “powerful” experimenter who 

ependent on him for data that confirm a 
Er ee Conditions that facilitate the help- 
ae dependent others may be those that 
as ens or inflate bias. But. Argyris (1968) 
Coca that good data might result be- 
Derso giving help is a tactic that powerless 

SOns use for making themselves liked, 
edis preventing powerful persons from 

ng their power against them (Jones, 1964). 
3 € faithful subject may also be helpful, but 
fig nies to help by providing data that 
0 ully reflect nature. Thus, determinants 
fait em may also cause subjects to become 
quie provided they know that science re- 
may honest data. However, faithful data 
ability SO stem from apathy or from the in- 
State po act upon a particular motivational 
is obvi inally, the apprehensive subject role 
Cia ously related to the literature on 50- 
ang Pred (Marlowe & Gergen, 1969) 
1965 -presentation (Goffman, 1959; Jones, 
be ca the explicit links still need to 


nt s 
“stating subject roles into better de- 


veloped theories may do more than suggest 
new testable sources of bias in experiments. 
It may force scientists to escape from their 
orientation to bias (the dependent variable of 
practical concern) to the specification of 
testable antecedents of roles. At present, most 
of the “testing” of roles comes from ex post 
facto explanations of data in terms of highly 
general processes. There is little specification 
of testable antecedents of roles, and it is only 
when such specification has taken place that 
the principles of subject behavior will become 
clearer and rigorous controls for bias can be 
developed. 

Conceptual improvements have to be made 
in the basic questions that are asked about 
subject performance. One question that has 
been previously asked concerns the effects of 
prior deception. This problem may very well 
be too global to be useful since, as we saw, 
prior deception is not systematically related to 
bias. It is not even obvious how prior decep- 
tion should affect performance, for while prior 
deception might increase frustration when it 
is considered illegitimate, it might also make 
subjects more apprehensive about appearing to 
be “dupes” who fail to spot a hypothesis. As 
a consequence, two countervailing motiva- 
tional forces, negativism and evaluation ap- 
prehension, might be set up. If this happened, 
it would be difficult to understand the effects 
of prior deception. It might prove more bene- 
ficial in the long run to differentiate between 
legitimate and illegitimate deception, between 
deceptions that do or do not involve abilities 
and adjustment, and between deceptions where 
the debriefing is Socratic and subjects learn 
the hypothesis for themselves and debriefings 
that are less Socratic. Each of these different 
kinds of deception has a link to one or another 
of the four hypothetical subject roles, and 
they may have interpretable effects on per- 
formance in later experiments. 

Another problem, related to the deception 
problem, is that of understanding the con- 
sequences of suspiciousness (McGuire, 1969). 
Suspiciousness of what? Subjects can be suspi- 
cious of an experimenter’s cover story, suspi- 
cious of what is the real independent variable, 
suspicious about what is the referent of some 
measure, suspicious about the behavior of a 
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fellow subject who might be an experimental 
accomplice, suspicious about a curtain that 
might conceal part of a one-way mirror, suspi- 
cious about a panic button, or perhaps they 
can be chronically suspicious of everything. 
Moreover, it is not clear whether suspicious- 
ness might be more usefully conceived as a 
dichotomous variable (one is or is not suspi- 
cious) or as a continuous variable that might 
be called certainty of deception, or certainty 
of knowing a hypothesis. To conceptualize 
suspicion as relative certainty either about 
deception or about knowing a hypothesis per- 
mits testing whether bias and such certainty 
measures are related curvilinearly (some state- 
ments by Orne, 1962, imply this). It also 
highlights the possibility that the study of 
suspicion may be nothing more than the study 
of effects of deception or of knowing a hy- 
pothesis, 

With the possible exception of Riecken 
(1962) and Argyris (1968), work in the area 
of subject artifacts uses the concept of “role” 
but does not relate subject behavior to find- 
ings from “role theory” (see Biddle & Thomas, 
1966) either to explain past findings or to 
predict new ones. Instead, “role” is used al- 
most synonymously with motivation or set. 
This is unfortunate, for we still lack com- 
prehensive preexperimental interview data 
from subjects about the "script" they are 
Supposed to follow in an experiment and about 
the “improvisations” (or deviations from the 
“script”) that are permissible. We still do not 
know who subjects think the “audience” 
for the "performance? an ex] 
experimenter's Supervisor, a research-sponsor- 
Ing agency, universal science. Nor do we 
know anything about how a subject conceives 
the rewards and punishments that accrue from 
Playing his “part” well. In short, the major 
variables in role theory suggest how little we 
know about subjects? perceptions of their role, 
and they also Suggest avenues for further ex- 
ploration into subject behavior. 

However, any conceptual advances will be 
relatively meaningless unless advances are 


is 
perimenter, an 


to permit unambig 
role or roles tha 
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cili ing 
niques have to be developed for epe AA 
unique inference. Thus, if a study of neg 


f t 

tivism is being conducted, it is imperative a 
the good and apprehensive subject zoles pa 
dict the opposite of what negativism pre e 
This would probably be the clearest "p. 
stration of negativism. It might not be a “a 
test of the role, however, since it would P | 
quire that pressures toward negakivism a | 
Strong enough to cancel out the counterval s 
effects of a positive self-presentation. T 
alternative to pitting negativism against H 
prehensiveness would be to keep ar. 
ness low and to hold it constant across A if 
ment conditions. It is clear, however, tha e 
a role-related analysis is to remain the I. 
one for understanding subject behavior, "i 
experiments which test role constructs er, 
have dependent variables which fail to P 
unambiguous inference about a role. bora- 

Studies on subject artifacts use ihe ien in 
tory to explore the behavior of subjec 
laboratories. Nonetheless, v pump do 
findings from these experimental procec u hy 
difficult. "Testing subject roles mn r 
pothesis learning and powerful manip re 
of potential sources of bias, But in ithe" 
sentative” research, hypotheses 
given nor made easily available, and 2 : 
are correlated with treatments and are ? gre? 
treatments themselves, It would be ? 2 


: op” 
: B s nique © 
methodological improvement if the we 


be P 


l 


external validity controls. Thus, if a Sel f 
subject roles requires telling a hypo med. H 
subjects, others should not be y at M 
the hypothesis to be told is a simple rs shoul 
subjects can easily operate on, a is Ie 

be told a differentiated hypothesis Es only pe 
easy to operate on, and others nage is i : j 
told what condition they are in (th oral 
most information that subjects M 

have). Tn studies of the effects of Ps 

tion, some subjects should perform rio 
experiment immediately after the P k 
ception (when motivation is high ene 
edge is fresh), while others should 


SE a 
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in the test experiment 1 or 2 weeks after a 
deception. Furthermore, if an artifact is being 
manipulated, an external validity control 
group should be added in which the artifact is 
Correlated with some treatment that was origi- 
nally designed to test a different kind of the- 
ory. Indeed, the basic procedure for all studies 
of subject artifacts should be taken with as 
few changes as possible from procedures that 
were designed for other than methodological 
Purposes. Research that tests artifacts will 
have greater external validity if designs in- 
Volve high-frequency procedures that were 
developed for testing general psychological 
theory. 


SUBJECT BEHAVIOR AND VALID INFERENCES 
ABouT CAUSAL VARIABLES 


There are three major threats to valid in- 

ference, The first threat is of false positive 
findings, and these occur when results which 
Confirm a hypothesis are attributed to a the- 
Ory-relevant construct when they were in fact 
Caused by subject artifacts. The second threat 
1S of false serendipitous findings. These might 
Occur when results do not confirm a hypothesis 
and are attributed ex post facto to some con- 
Struct other than the subject artifact that did 
Cause them. The third threat is of false nega- 
tive findings. That is, results might fail to 
Teach conventional levels of statistical sig- 
nificance, not because a particular effect is 
absent in nature, but because the particular 
role that subjects have adopted inflates error 
Variance and decreases statistical power, or 
ecause some third variable acts as a sup- 
Dressant, Let us now relate these three kinds 
of invalid inference to the major artifacts re- 
Vealed by this study: adopting an apprehen- 
Sive subject role, learning a hypothesis, and 
adopting the faithful subject role. i 

'5 there anything about experimentation 
Which dictates that when evaluation appre- 
Ension is aroused, it will be correlated with 
treatments? We think there may be in ex- 
Petiments of the kind that McGuire (1968) 

> amusingly characterized as “Festingerian. 
€orists of the methodology of this tradition 

a onson & Carlsmith, 1968) have written: 
5 the major objective of a laboratory ex- 

Periment to have the greatest possible impact 


on a subject within the limits of ethical con- 
sideration and requirements of control [p. 
23]." Furthermore, many treatments in this 
tradition are unusual and take place in a 
face-to-face setting with an experimenter who 
gives little feedback. Such a situation should 
produce considerable arousal of evaluation ap- 
prehension, and it will presumably be stronger 
in the high-impact treatment conditions than 
in a no-treatment control group. In addition, 
there will often be differences in evaluation 
apprehension between treatment groups, for in 
the typical factorial design some treatments 
are more unusual than others. Since subjects 
are typically ignorant of what is appropriate 
in such unusual situations, they may look to 
the experimenter or situation for cues to guide 
behavior. Examples of treatments that prob- 
ably differ in these respects include being paid 
$20 versus $1; undergoing a mild versus a 
severe initiation into a group; being allowed 
to choose between careers that are equivalent 
versus discrepantly valent. Hence, evaluation 
apprehension may produce either false positive 
or false serendipitous findings whenever ex- 
perimenters use unusual manipulations de- 
signed to create an impact that is greater in 
some conditions than in others. 

Valid inference can be threatened by false 
negative findings as well as by false positive 
ones, and it is possible that evaluation ap- 
prehension may sometimes increase error vari- 
ance in experimental designs. Subjects who 
are apprehensive may develop different no- 
tions of the meaning of their responses and 
consequently may act in different ways to 
enhance themselves. Page (1968, 1969; Page 
& Lumia, 1968) presented bimodal patterns 
of data and argued that such data indicate 
the operation of demand characteristics. Such 
bimodality will also increase the variance 
within conditions, and it becomes less likely 
that a hypothesis will be statistically sup- 
ported. Variances ought to receive greater at- 
tention in research on subject behavior. 

The threats to valid inference that result 
from hypothesis learning depend on the par- 
ticular hypothesis that is learned. If subjects 
learn the experimenter’s hypothesis and com- 
ply with it, false positive findings will result, 
If they want to disconfirm this hypothesis, or 
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if they learn a treatment-correlated hypothesis 
that is not the experimenter's, false serendip- 
ity can result. Finally, false negative findings 
may result if subjects learn a variety of hy- 
potheses that are not treatment correlated, 
for acting on the basis of divergent hypotheses 
may sometimes increase error variance. 

But do subjects typically learn any one of 
these hypotheses? It is probably rare for dif- 
ferent subjects to learn different hypotheses 
that are not treatment correlated. Even if this 
happened, the resulting false negative findings 
would be less troublesome than false positive 
findings for the accumulation of knowledge. 
Troublesome false positive findings occur if 
the experimenter’s hypothesis or a treatment- 
correlated hypothesis is learned. Yet four sepa- 
rate considerations suggest that learning a 
hypothesis may be infrequent. First, there 
was some support for the passive version 
of the faithful subject, and it seems un- 
likely that apathetic s 


ects may 


inderlying 
hypothesis, but they did not learn it or they 


did not act upon it. Third, subjects who did 
and were instructed to 
ts gave the same data 
jects. Fourth, and most 


h : 
€ foregoing Problems relate to the speci- 


it 
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fication of experimental meae, Eg 
roles also pose a threat to external validity. 3 
far as faithful behavior is concerned, it may 
be widespread especially in contexts where 
there are no cues to evaluation qanm. 
or to a hypothesis. The concern of faith E 
subjects to follow instructions has a 
quences that can be partially described by aa 
sidering the field of attitude change. The e 
ject who dutifully listens to a message me 
an incompetent speaker is not like the pA ns 
who might switch off his television set 1 ho 
heard the same speaker; the subject W is 
faithfully fills out an attitude questionnaire ig 
not like the person who would auo EM. i 
answer or who would like to talk to ea 
order to discover his opinion. For prs a 
many other reasons, the laboratory pe Re 
persuasion is representative only of se ply 
where a message is learned in order to comp? 
with instructions. T o likely 
Restrictions of external validity are bedi- 
to be equally misleading in studies of p in 
ence, faithfulness, or aggression. That e 
the special situation of the laboratory onts 
ment, where the powerless subject T Jegiti- 
the powerful experimenter in a socially d 
mate status hierarchy, the predetermine 
cial nature of the interaction dictates pp 
subject will obediently shock his pens (Mil 
than he might outside of the reese > per- 
gram, 1963), or that he might faithfully 


2 rne: 

sist for hours in a meaningless task agains 
: ress é 

1962), or that he might not aggres 1968) 


an experimenter (Aronson & Carlsmith, 1 ipd 
Tt is not that the laboratory is invali or a% 
site to study obedience, faithfulness, ed i 
gression, However, it is normally rent 

studying only that obedience, faith tt ed 
aggression which is embedded in a leg can be 
power hierarchy, Of course, legitimacy er, it 
varied. But typically it is not. More ry 10W 
might scient 


menters, J 


"T 

Evaluation apprehension can also er tbe 
ternal Validity, Tf procedural cues “ and 
arousal of apprehension about how : ht AT 
adjustment wil] be evaluated, this m8 ay 


ud) 
y es 
the absolute amount learned in on 
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the social desirability of personality ratings 
in another. There may be even more severe 
restrictions in other areas of social psychol- 
ogy. For example, bargaining studies give the 
impression of social man who is more competi- 
tive than cooperative. Yet the amount of com- 
Petitiveness may be more related to the 
Subject’s apprehensiveness about how the ex- 
perimenter will evaluate his bargaining per- 
formance than to the high frequency of a 
Competitive orientation in real-life conflicts of 
Interest, Furthermore, studies of cognitive con- 
Sistency indicate a social man who is mo- 
tivated to achieve consistency within his “cog- 
hitorium,” But, as Bem (1970) concluded 
from a review of survey data, there seems to 
be little evidence for such consistency outside 
of the laboratory. The consistency found in 
the laboratory may result because the labora- 
tory is simply where inconsistencies are made 
Salient. But, alternatively, it may be one of 
the sites where persons are especially eager to 
Appear rational and consistent. Finally, it has 
been frequently reported that persons seek 
Information selectively and in accordance with 
Opinions they already hold (Schramm & 
arter, 1959). But the evidence for selective 
*Xposure from laboratory experiments is ex- 
tremely inconclusive (Freedman & Sears, 
1965), and one possible explanation for this 
'S that subjects in experiments do not want to 
®Dpear biased. They prefer to appear open 
and flexible, which is probably the cultural 
Idea], è 
4 In Summary, the roles that subjects adopt 
an threaten both the interpretation of the in- 
“Pendent variable and the external validity 
de *Xperiments, False inferences about an in- 
Pendent variable may be drawn whenever 
y eXperimental hypothesis is learned, or when- 
€r Self-presenting responses are correlated 
th treatments, However, hypothesis learn- 
7 is likely to be infrequent in most experi- 
“Sins Which are complex in design and where 
*dures do not permit a simple hypothesis 
>e learned easily. It is not clear how fre- 
ntly evaluation apprehension is correlated 
5 treatments, and this question is probably 
th Most important single question raised by 
tug esent review, On it depends the aed 
9f the threat of subject behavior to cor- 


Wi 


que; 
Wi 


rect causal inference in experiments, for 
neither the faithful subject role nor hypothesis 
learning is likely to affect inferences about 
treatments with any great frequency. 
However, irrespective of the frequency of 
treatment-correlated artifacts, it is clear that 
there are procedure-correlated artifacts that 
seriously limit the range over which results 
from laboratory experiments can be general- 
ized. Results will not only be specific to col- 
lege students and to volunteers among these; 
they will also be restricted to a narrow mo- 
tivational range that primarily encompasses 
evaluation apprehension or faithfulness. Of 
course, restricted generalizability is endemic to 
the behavioral sciences (see Campbell, 1969), 
and generalization beyond the setting of mea- 
surement is logically impossible. In most field 
research, that setting is in some sense eco- 
logically valid, whereas this is not the case 
for the laboratory setting. Thus, laboratory 
experiments typically have limited utility for 
a description of social behavior, and the ab- 
solute levels of persuasibility, obedience, con- 
sistency, openness, and interpersonal competi- 
tion may be systematically overestimated and 
valid for no other setting. If so, this may have 
harmful consequences for the incidental so- 
cialization of scientists who are exposed ex- 
clusively to laboratory studies of personality 
and social psychology. A wider contact with 
naturalistic studies, both experimental and 
otherwise, is required as a corrective for the 
systematically distorted picture of social be- 
havior that emerges because of the roles that 
subjects may adopt in laboratory experiments. 


IMPLICATIONS FOR THE IMPROVEMENT OF 

LABORATORY EXPERIMENTS IN GENERAL 

It is obvious from the previous review that 
all experiments should be designed so that 
hypotheses are difficult to learn. This should 
be a sine qua non of experimentation, espe- 
cially where designs are simple. Orne (1969) 
has suggested that quasi-controls be used in 
pilot testing to assess the effect of demand 
characteristics. He mentioned three types of 
quasi-controls: (a) the postexperimental in- 
quiry; (b) the nonexperiment, where the ex- 
perimental equipment is shown to the subject, 
the procedures are described to him, the in- 
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structions are read to him, and he is then 
asked to produce data as if he had actually 
received the experimental treatment; (c) 
simulators, persons who pretend to have re- 
ceived the treatment and who give data in the 
presence of an experimenter who is blind to 
their status. From these techniques, informa- 
tion may be gained about the likelihood that 
the hypothesis can be learned. 

However, hypothesis learning is less of a 
problem than the activation of subject roles. 
The experimenter’s task, it might be thought, 
is to prevent subjects from adopting roles. 
In some cases this can be done by conducting 
experiments outside of the laboratory with 
treatments and measurement that are un- 
obtrusive, However, despite their potential ad- 
vantages in decreasing measurement and treat- 
ment artifacts, field experiments are not likely 
to be an effective alternative to laboratory ex- 
periments. Most field experiments are in- 
transigent to multifactorial design, to flexible 
manipulation of theory-relevant variables, and 
to a rich collection of dependent variables that 
allows an effect to be specified: Another al- 
ternative is to use populations in which mem- 
bers have not learned the role or roles that 
are related to being a subject in an experi- 
ment (e.g. children and noncollege adoles- 
cent or adult samples). But these populations 
will become contaminated as persons learn 
about experimentation from formal or in- 
formal sources and as they associate experi- 
menters with authority figures. 

Aronson and Carlsmith (1968) have pro- 
posed another alternative, They contend that 
a general goal of experimentation should be 
the creation of experimental realism, They 
mean by this, first, that treatments should be 
so “psycho-logically” related to a procedure 
that they do not arouse suspicion and, sec- 
ond, that the treatments should have enough 
Impact that subjects become absorbed in them 
and become oblivious to considerations of 


role behavior. In a sense, the second aim of 
experimental realism is 
ments a motivatio: 
other force, 


to bring into experi- 
nal force that overrides any 


espect to the good or 
it is presumably dif- 
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ficult to learn a hypothesis and to act ac- 
cording to it when one is highly involved in 
the treatment and procedure. For the same 
reason, it is probably difficult to be passively 
faithful. But it is not so clear that high ex- 
perimental realism invariably rules out ap- 
prehensiveness. There will be cases, of course, 
where both experimental realism and evalua- 
tion apprehension will be high and where 
evaluation apprehension will not be a threat to 
validity since it is the process that wie. 
findings. The conformity experiments of pu 
(1956) and the obedience experiments of Mit- 
gram (1963) are of this type. Invalid infer- 
ence will only occur when evaluation appre- 
hension is not postulated to be the mechanism 
that mediates effects. When a person is p! 
$20 to argue against his belief, this sum Ee 
have consequences that are totally engem 
But in novel situations with high arousal, 5U A 
jects cannot respond in some habitual pe 
and must look to other persons, like the M. 
perimenter, for cues which indicate approp” 
ate responses. Although experimenters na 
mally try to eliminate such cues, the involving 
but ambiguous situation may motivate € 
jects to find some cue even when the ur 
menter intends none. Since these grasa 
may frequently be correlated with treatmen™ 
it will be unclear whether apprehensiveness E. 
treatments have caused any measured p. 
These speculations suggest that evaluation ith 
prehension may sometimes be correlated v 
both experimental realism and experime y 
treatments and that this pattern of inte" un 
relations may pose a threat to inferenc® p 
less evaluation apprehension is the cO? 
that is used to explain treatment effects- n 
The implication of this is that experine 

realism, while a desirable goal in some mpt* 
texts, is not a panacea, Of course, atte x 
can be made to reduce apprehensivenes?: pe 
perimenters should not be high-status und 
sons. They should not control reware> of 
Punishments. The more tangible sour’ ould 
reward power (credit, payment, etc. begins 
be signed away before the experiment s p 
Experimenters should not be labeled pe inr 
chologists and should not, explicitly lA of 
plicitly, claim skills in evaluating abi ont 
adjustment. Furthermore, the ep art 
should be designed so that all respon? 
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at least anonymous. Experiments should also 
alm, wherever possible, to have subjects per- 
form experimental tasks in settings where they 
cannot look to the experimenter for perform- 
ance feedback. Finally, the response on the 
dependent variables should be as dissociated 
as possible from the time and place in which 
the subject underwent the treatment. If such 
Procedures could be coupled with experi- 
Mental realism, then valid inference would be 
all the more likely. Unfortunately, however, 
Many factors which decrease evaluation ap- 
Prehension will also decrease experimental 
realism, 

Indeed, attempts to lower apprehension 
Could have the unfortunate incidental con- 
Sequence of making subjects think that an 
experiment is trivial, and this will attenuate 
the impact of treatments. Hence, procedures 
to lower evaluation apprehension should be 
Coupled with explicit statements that cue sub- 
Jects into the faithful subject role. A serious 
task orientation, small groups of subjects who 
think their individual but anonymous re- 
Sponses are needed, preliminary reference to 

€ importance of a project—these factors may 
Cue Subjects into becoming faithful without 
forming them of a hypothesis or arousing 

eir evaluation apprehension. Such a frame- 
Work for conducting studies is probably most 
useful wherever procedures with high experi- 
Mental realism cannot be used because of a 
Possible correlation with evaluation apprehen- 
Sion, Deliberate cueing into the faithful sub- 
Ject role will reduce external validity, but the 

"imary goal of experiments is to make valid 
Causal inferences, and these are not restricted 
When subjects are faithful. 
" nother goal of experimental design should 
© the development of treatments that differ 

sentially in impact and minimally in ob- 
a Vables, Treatments should create as large 

. Psychological difference as possible on the 

amension of theoretical interest, while at the 

ue time they should minimize the number 
th Situational cues that are irrelevant to E 

Soretica] construct and that might vary with 
creates (perhaps to make the aie A 
valiq le). These incidental observab Y s. 

east inference all the more difficult od 
Sion z the chances of evaluation appre 4 

eing differently aroused across exper 
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mental conditions (Orne & Scheibe, 1964). 
The best treatments differ in one sentence, a 
single value, or a single element in the physi- 
cal arrangement of the experiment. The 
strategy of maximal impact and minimal dif- 
ference will not rule out treatment-correlated 
evaluation apprehension. But it may make 
evaluation apprehension less likely by mini- 
mizing treatment-correlated cues that arouse 
apprehension or that direct responses which 
are congruent with it. 


CONCLUSION 


The most important implications of the 
present study concern the validity of infer- 
ences that can be made from laboratory ex- 
periments about human behavior. As far as 
valid inference about causal treatments is con- 
cerned, neither hypothesis learning nor the 
faithful subject role is likely to lead to false 
positive, false negative, or false serendipitous 
findings if care is taken to camouflage hy- 
potheses. But adoption of the apprehensive 
subject role may threaten valid causal in- 
ference wherever it is treatment correlated. It 
is not yet clear when this is the case, but the 
problem is a crucial one. On it depends 
whether subject bias is frequent or infrequent 
and whether, in terms of the historical review 
that began this study, subject artifacts have 
been more exaggerated by early researchers 
than is warranted by the results of subsequent 
research. Valid inference also concerns ex- 
ternal validity—the range of generalizability. 
This review indicates that laboratory findings 
are probably not generalizable beyond a small 
population of subjects and beyond a narrow 
range of motivations within this population. 
This restricted range of motivation could ac- 
count for some of the discrepancies between 
the findings of laboratory and field research. 
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Why does a parent physically abuse his or 
her own child? During the past 10 years, 
many attempts have been made to answer 
this question, An extensive literature has 
emerged on the medical and legal aspects of 
the problem of child abuse since the publica- 
tion of an article by Kempe, Silverman, 
Steele, Droegemueller, and Silver (1962) and 
the pursuit of child-protective laws in Cali- 
fornia by Boardman (1962, 1963). Sociologists 
and social workers have contributed their 
share of insights, and a few psychiatrists 
have published their findings, but surprisingly 
little attention has been devoted to the 
problem of child abuse by the psychologist. 
One seeks with little success for well-de- 
signed studies of personality characteristics 
of abusing parents. What appears is a litera- 
ture composed of professional opinions on 
the subject, 
The aim of this review is to bring together 
Professional opinions of this decade on the 
Psychological characteristics of the abusing 
parent, in order to determine from the most 
commonly held opinions what generalizations 
can be induced and thus to lay the ground- 
work for systematic testing of hypotheses, 
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ature reveals that (a) the abusing 
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DEFINITION 


What is child abuse? Kempe et al. Ge 
limited their study to children who hac um- 
ceived serious physical injury, in oe 
stances which indicated that the injury lent. 
caused willfully rather than by "ur to 
They coined the term “battered rei 
encompass their definition. Zalba (I wise 
afler a brief review of definitions, e in 
addressed himself primarily to those ficta 
which physical injury was willfully in ae 
on a child by a parent or parent substit ting 

Because of the difficulty of pinha 
what is emotional or psychological or 5 ex- 
neglect and abuse, and because of aA 
tent of the literature on physical ae i 
this review, following Kempe's and to 
lead, limits the term “child vage OE will- 
concept of physical injury to the pem 
fully inflicted. The review omits stu notion" 
parents who neglect their children—€™ lts 
ally, socially, or psychologically—and 
who sexually molest them. 


" 
MEDICAL AND Lecar HISTOR 


ec 

Literature on the medical and legal eve 
of the problem of child abuse is €x Ke 
The edited volume of Helfer and the 
(1968) contains a general overview, 45 silver 
articles by Paulson and Blake (1967), s are 
(1968), and Zalba (1966). Legal aspec jd 
delineated in De Francis (1970), 
(1965), and the various articles by PA ^ss 
(1966a, 1966b, 1967, 1968a, 1968b). 
and Downs (1968) gave an overview 
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terns, problems, and accomplishments of the 
child-abuse reporting laws. A thorough bibli- 
ography on child abuse was published by the 
United States Department of Health, Educa- 
tion and Welfare (1969). 

This review is not concerned with the medi- 
cal and legal aspects of the problem and re- 
lers only to those articles that gave more 
than a passing mention to the psychological 


and social determinants of parental abuse of 
children. 


Review or THE LITERATURE 


. Most of the studies of child abuse are sub- 
Ject to the same general criticism. First, the 
Studies that set out to test specific hypotheses 
are few, Many start and end as broad studies 
With relatively untested common-sense as- 
Sumptions, Second, in most studies in this 
area, the researchers used samples easily 
available from ready-at-hand local popula- 
tions, and thus the samples were not truly 
Tepresentative. We shall have to rely on the 
Onvergence of conclusions from various types 
of Sampling to establish generalizations. 
hird, practically all of the research in child 
abuse is ex post facto. What is left unan- 
Swered and still to be tested is whether one 
can determine prior to the onset of abuse 
Which parents are most likely to abuse their 
Children, or whether high-risk groups can 
nly be defined after at least one incident 
9! abuse has occurred. 

In spite of these criticisms, the studies of 
Child abuse do give general data that can 
“nish hypotheses for more rigorous research 
design, and for a more differentiated ap- 
Toach to the question of why parents abuse 

€lr children, 
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Cmographic Characteristics 


demographic charac 
amilies, Kempe et al. 
abusing families a high : 
Ce, separation, and unstable marriages, 
ell as of minor criminal offenses. The 
Tén who were abused were very young, 
n under one year of age. In many of the 
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families, children were born in very close 
succession. Often one child would be singled 
out for injury, the child that was the victim 
of an unwanted pregnancy. 

Various other studies enter figures from 
their own samples, generally repeating 
Kempe’s findings (Birrell & Birrell, 1968; 
Cameron, Johnson, & Camps, 1966; Ebbin, 
Gollub, Stein, & Wilson, 1969; Elmer & 
Gregg, 1967; Gregg & Elmer, 1969; Helfer 
& Pollock, 1967; Johnson & Morse, 1968; 
Nurse, 1964; Schloesser, 1964; Skinner & 
Castle, 1969). 

Elmer (1967) and Young (1964) add to 
Kempe’s findings the factors of social and 
economic stress, lack of family roots in the 
community, lack of immediate support from 
extended families, social isolation, high mobil- 
ity, and unemployment. 

While pointing to the role that economic 
and social stresses play in bringing out un- 
derlying personality weaknesses, the majority 
of the foregoing authors caution that eco- 
nomic and social stresses alone are neither 
sufficient nor necessary causes for child abuse. 
They point out that, although in the so- 
cially and economically deprived segments of 
the population there is generally a higher 
degree of the kinds of stress factors found in 
abusing families, the great majority of de- 
prived families do not abuse their children. 
Why is it that most deprived families do not 
engage in child abuse, though subject to the 
same economic and social stresses as those 
families who do abuse their children? 

A study that sheds light on the fact that 
social and economic factors have been over- 
stressed as etiological factors in cases of child 
abuse is that of Steele and Pollock (1968), 
whose sample of abusers consisted mainly of 
middle-class and upper-middle-class families, 
Though social and economic difficulties may 
have added stress to the lives of the parents, 
Steele and Pollock considered these stresses 
as only incidental intensifiers of personality- 
rooted etiological factors. 

Simons, Downs, Hurster, and Archer (1966) 
conducted a thorough study delineating abus- 
ing families as multiproblem families in Which, 
not the socioeconomic factors alone, but the 
interplay of mental, physical, and emotional 
stresses underlay the abuse. 
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Allowing that child abuse in many cases 
may well be the expression of family stress, 
Adelson (1961), Allen, Ten Bensel, and 
Raile (1969), Fontana (1968), Holter and 
Friedman (1968), and Kempe et al. (1962) 
considered psychological factors as of prime 
importance in the etiology of child abuse. 
There is a defect in character Structure which, 
in the presence of added Stresses, gives way 
to uncontrolled physical expression. 

Paulson and Blake (1969) referred to the 
deceptiveness of upper- and middle-class 
abusers, and cautioned against viewing abuse 
and neglect as completely a function of educa- 
tionally, occupationally, €conomically, or so- 
cially disadvantaged parents, or as due to 
physical or health impoverishment within a 
family, 

If it is true that the majority of parents in 
the socially and economically deprived seg- 
ments of the population do not batter their 
children, while some well-to-do parents en- 
gage in child abuse, then one must look for 
the causes of child abuse beyond socioeco- 
nomic stresses. One of the factors to which 
one may look is parental history. 


Parental H. istory 


One basic factor in the etiology of child 
abuse draws unanimity: Abusing parents were 
themselves abused or neglected, physically or 
emotionally, as children. Steele and Pollock 
(1968) have shown a history of parents hav- 
ing been raised in the same style that they 
have recreated in the pattern of rearing their 
own children, As infants and children, all of 
the parents in the groups were deprived both 
of basic mothering and of the deep sense of 


eing cared for and cared about from the be- 
Sinning of their lives, 


Fontana (1968 
emotionally 


In a stud 


ied Y surveying 
imprisoned 


for Cruelty to their children, Gib- 


) concluded that it 
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was rejection, indifference, and hostility E. 
their own childhood that produced the crue 
arents. 
Ten years later, Tuteur and Glotzer (1966) 
studied 10 mothers who were hospitalized x 
murdering their children and found that a 
had grown up in an emotionally cold -€ 
often overtly rejecting family vare 
in which parental figures were either absen 
or offered little opportunity for wholesome 
identification when present. “ile 
Komisaruk (1966) found as the most M 
ing statistic in his study of abusing famili 1 
the emotional loss of a significant gnis 
figure in the early life of the abusive Len 
Perhaps the most systematic and wage 
trolled study in the area of child abuse, a 
of Melnick and Hurley (1969), ec 
two small, Socioeconomically and RE 
matched groups on 18 personality varia ther 
Melnick and Hurley found, among oe 
things, a probable history of emotiona 
privation in the mothers’ own nite, Lr 
Further Support for the hypothesis do 
the abusing parent was once an rta 
neglected child is found in Bleiberg a 
Blue (1965), Corbett (1964), Curtis (19 i 
Easson and Steinhilber (1961), Fairburn : 
Hunt (1964), Fleming (1967), Green on 
Harper (1963), Kempe et al. (1962); rris 
Henry, Girdany, and Elmer (1963), MAS: 
Gould, and Matthews (1964), Nurse M 
Paulson and Blake (1969), Silver, DOG 
and Lourie (1969b), and Wasserman (1 68); 
In a summary statement, Gluckman (19 ini 
repeating the findings of earlier observers oy, 
up a 10-point differential diagnosis hats: 
His main point, and the point of this pn r 
of the review, is that the child is the in 
of the man, The capacity to love is P Char- 
herent; it must be taught to the child. ance 
acter development depends on love, toler were 
and example. Many abusing parents 
raised without this love and tolerance. 


ene 
Parental Attitudes toward Child Reari"s hat 
: : E 
In addition to concurring on the fact » q 


raise! 
many abusing parents were themselves 3 the 
With some degree of abuse or neglec sha 
authors agreed that the abusing pan t 
common misunderstandings with rega 
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the nature of child rearing, and look to the 
child for satisfaction of their own parental 
emotional needs. 

Steele and Pollock (1968) found that the 
parents in their study group expected and de- 
manded a great deal from their infants and 
children, and did so prematurely. The parents 
dealt with their children as if older than they 
teally were. The parents felt insecure and 
unsure of being loved, and looked to their 
children as sources of reassurance, comfort, 
and loving response, as if the children were 
adults capable of providing grown-up com- 
fort and love. 

Melnick and Hurley (1969), in their well- 
Controlled study of personality variables, also 
found in the mothers severely frustrated de- 
Pendency needs, and an inability to empathize 
With their children. 

Galdston (1965) concurred that abusing 
Parents treated their children as adults, and 
* added that the parents were incapable of 
Understanding the particular stages of de- 
velopment of their children. 

Bain (1963), Gregg (1968), Helfer and 
Pollock (1967), Hiller (1969), Johnson and 
Morse (1968), Korsch, Christian, Gozzi, and 

arlson (1965), and Morris and Gould (1963) 
also reported that abusing parents have a 
igh expectation and demand for the infant’s 
Or child's performance, and a corresponding 
disregard for the infant’s or child’s own needs, 
'Mited abilities, and helplessness. Wasser- 
Man (1967) found that the parents not only 
Considered punishment a proper disciplinary 
Measure but strongly defended their right to 
se physical force. 

In a 1969 study, Gregg and Elmer, com- 
Paring children accidentally injured with those 
“bused, judged that the mother’s ability to 
Keep up the personal appearance of the child 
When well, and her ability to provide medical 
“are when the child was moderately ill, 
Sharply differentiated the abusive from the 
Onabusive mothers. ; 

he authors seem to agree that abusing 
‘rents lack appropriate knowledge of child 
“aring, and that their attitudes, expectations, 
and child-rearing techniques set them apart 
ake honabusive parents. The abusing parents 

Plement culturally accepted norms for rais- 
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ing children with an exaggerated intensity 
and at an inappropriately early age. 


Presence of Severe Personality Disorders 


There has been an evolution in thinking re- 
garding the presence of a frank psychosis in 
the abusing parent. Woolley and Evans 
(1955) and Miller (1959) posited a high in- 
cidence of neurotic or psychotic behavior as 
a strong etiological factor in child abuse. 
Cochrane (1965), Greengard (1964), Platou, 
Lennox, and Beasley (1964) and Simpson 
(1967, 1968) concurred. Adelson (1961) and 
Kaufman (1962) considered only the most 
violent and abusive parents as having schizo- 
phrenic personalities. Kempe et al. (1962), 
allowing that direct murder of children be- 
trayed a frank psychosis on the part of the 
parent, found that most of the abusing par- 
ents, though lacking in impulse control, were 
not severely psychotic. By the end of the 
decade, the literature seemed to support the 
view that only a few of the abusing parents 
showed severe psychotic tendencies (Fleming, 
1967; Laupus, 1966; Steele & Pollock, 1968; 
Wasserman, 1967). 


Motivational and Personality Variables: A 
Typology 

A review of opinions on parental personal- 
itv and motivational variables leads to a con- 
glomerate picture. While the authors gen- 
erally agree that there is a defect in the abus- 
ing parent's personality that allows aggressive 
impulses to be expressed too freely (Kempe 
et al., 1962; Steele & Pollock, 1968; Wasser- 
man, 1967), disagreement comes in describing 
the source of the aggressive impulses. 

Some authors claim that abuse is a final 
outburst at the end of a long period of ten- 
sion (Nomura, 1966; Ten Have, 1965), or 
that abuse stems from an inability to face 
life’s daily stresses (Heins, 1969). Some claim 
that abuse stems from deep feelings of in. 
adequacy or from parental inability to fulfill 
the roles expected of parenthood (Cohen, 
Raphling, & Green, 1966; Court, 1969; Fon- 
tana, 1964; Johnson & Morse, 1968; Komi- 
saruk, 1966; Silver, 1968; Steele & Pollock, 
1968). Others described the parents as im- 


mature, self-centered, and  impulse-ridden 
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(Cochrane, 1965; Delaney, 1966; Jacobziner, 
1964; Ten Bensel, 1963). 

Some authors consider a role reversal be- 
tween the spouses as a prime factor in the 
etiology of child abuse. A home in which the 
father is unemployed and the mother has 
taken over the financial responsibility of the 
family is considered a breeding ground for 
abuse (Galdston, 1965; Greengard, 1964; 
Nathan, 1965; Nurse, 1964). 

Finally, there are those authors who con- 
sidered low intelligence as a prime factor in 
the etiology of child abuse (Fisher, 1958; 
Simpson, 1967, 1968), although this point is 
disputed in the findings of Cameron et al. 
(1966), Holter and Friedman (1968), Kempe 
et al. (1962), and Ounsted (1968). 

Is there a common motivational factor be- 
hind child abuse? Is there only one "type? 
of abusing parent? Realization that each of 
the above described characteristics was found 
to exist at least in some individual circum- 
Stances has led some authors to group to- 
gether certain characteristics in clusters, and 
to evolve a psychodynamic within each 
cluster. The first major attempt at a typology 
was made by Merrill (1962). Because Mer- 
rill’s typology is the most often quoted, it is 
summarized in some detail. 

Merrill identified three distinct clusters of 
personality characteristics that he found to 
be true both of abusing mothers and fathers, 
and a fourth that he found true of the abusing 
fathers alone. The first group of parents 
seemed to Merrill to be beset with a con- 
tinual and pervasive hostility and aggressive- 
ness, sometimes focused, sometimes directed 
at the world in general. This was not a con- 
trolled anger, and was continually with the 
parents, with the only stimulation needed for 
direct expression being normal daily diffi- 
culties, This angry feeling stemmed from con- 
flicts within the parents and was often rooted 
in their early childhood experiences. 

The second group Merrill identified by per- 
sonality Characteristics of ri jI 


c gidity, compul- 
Siveness, lack of war 


mth, lack of reasonable- 
pliability in thinking and 
5€ parents defended their right 

ad in abusing their child. 
eot à group had marked child-re- 
Jection attitudes, evidenced by their primary 
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concern with their own pleasures, inability to 
feel love and protectiveness toward their 
children, and in feelings that the children 
were responsible for much of the trouble 
being experienced by themselves as parents. 
These fathers and mothers were extremely 
compulsive in their behavior, demanding y 
cessive cleanliness of their children. Many 0 
these parents had great difficulty in relaxing, 
in expressing themselves verbally, and in ex- 
hibiting warmth and friendliness. 1 
Merrill’s third group of parents showed 
strong feelings of passivity and dependence. 
Many of these parents were people who w a 
unassuming, reticent about expressing we 
feelings and desires, and very unaggress s 
They were individuals who manifested pow 
needs to depend on others for decisions. € 
mothers and fathers often competed W! s 
their own children for the love and ges 
of their spouses. Generally depressed, em 
unresponsive, and unhappy, many of th 
parents showed considerable immaturity. " 
Merrill’s fourth grouping or cluster of p h 
sonality characteristics included a signif 
number of abusing fathers, These fat^ 


aet with 
were generally young, intelligent es psica! 
acquired skills who, because of some p Y ble 


disability, were now fully or partially wor 
to support their families, In most of i the 
situations, the mothers were working, an hil- 
fathers stayed at home, caring for the y 
dren. Their frustrations led to swift jine. 
severe punishment, to angry, rigid discip - 

Two further attempts at classification, ig 
sordo (1963) and Zalba (1967), with rill’ 
modifications, can be reduced to Me! 
categories. unify- 

The use of categories seems simple, | 
ing, and time saving. Tf further work wating 
done in refining the categories, veli i 
them in field research, perhaps they or an 
clusters shown to be empirically valid f high 
used as an aid in the determination 0 
risk parents. 

Tn this section, we have seen a con er 
picture of parental motivational and ad to 
ality variables, with one author's atte rkable 
cluster the characteristics into a wimer ses 
unity. One basic fact of agreement pe ma 
from the studies in this section. The 2 from 
feel that a general defect in character 


te 
glomera 
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whatever source—is present in the abusing 
parent allowing aggressive impulses to be ex- 
pressed too freely. During times of additional 
Stress and tension, the impulses express them- 
selves on the helpless child. 


CRITIQUE OF A SURVEY 


Of the studies surveying the demographic 
characteristics of families in which child abuse 
has occurred, the most extensive in scope was 
the national survey undertaken by Gil (1968a, 
1968b, 1969). In 1969, Gil reported that the 
Phenomenon of child abuse was highly con- 
Centrated among the socioeconomically de- 
Drived segments of the population. Concluding 
that "physical abuse is by and large not very 
Serious as reflected by the data on the extent 
and types of injury suffered by the children 
In the study cohort [p. 862],” Gil placed his 
Intervention strategy in the general better- 
ment of society. For Gil, the cultural attitude 
Permitting the use of physical force in child 
tearing is the common core of all physical 
abuse of children in American society. Since 
J€ found the socioeconomically deprived rely- 
ng more heavily on physical force in rear- 
Ing children, he recommended systematic edu- 
cational efforts aimed at gradually changing 
this cultural attitude, and the establishment 
9f clear-cut cultural prohibitions against the 
Use of physical force as a means of rearing 
children, He viewed this educational effort 
as likely to produce the strongest possible re- 
duction in the incidence and prevalence of 
Physical abuse of children. 

For Gil, child abuse is ultimately the re- 
Sult of chance environmental factors. While 
admitting to various forms of physical, social, 
‘tellectual, and emotional deviance and pa- 

ology in caretakers, and in the family units 
to which they belong, Gil stressed a global 
Control of environmental factors as the solu- 
tion to the problem of child abuse. He sug- 
Rested: (a) the elimination of poverty from 
‘he midst of America's affluent society: (b) 


* Gil's book reporting his national findings (I p 
ce against children: Physical child abuse i the 
"ied States. Cambridge, Mass.: Harvard Univer- 
the present review 
Although the book 
nclusions are 


lenc 

sity 

wa Press, 1970) appeared after 

fie accepted for publication. 

ide 7$ greater detail, the findings and cor 
ntical to those in the cited references. 
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the availability in every community of re- 
sources aimed at the prevention and allevia- 
tion of deviance and pathology; (c) the 
availability of comprehensive family planning 
programs and liberalized legislation concern- 
ing medical abortions, to reduce the number 
of unwanted children; (d) family-life educa- 
tion and counseling programs for adolescents 
and adults in preparation for and after mar- 
riage, to be offered within the public school 
system; (e) a comprehensive, high-quality, 
neighborhood-based national health service, to 
promote and assure maximum feasible physi- 
cal and mental health for every citizen; (f) a 
range of social services geared to the reduc- 
tion of environmental stresses on family life; 
and (g) a community-based system of social 
services geared to assisting families and chil- 
dren who cannot live together because of 
severe relationship problems. Gil's ultimate 
objective is “the reduction of the general 
level of violence, and the raising of the gen- 
eral level of human well-being throughout 
our entire society [p. 863 ]." 

While one must praise the efforts of the 
Gil study in data collection, and the ultimate 
objective of reducing the general level of 
violence and raising the general level of hu- 
man well-being in our entire society, one 
cannot help but feel that Gil did not address 
himself to the question of child abuse. If 
there really does exist as strong a link as Gil 
suggests between poverty and physical abuse 
of children, why is it that all poor parents 
do not batter their children, while some well- 
to-do parents engage in child abuse? Eliminat- 
ing environmental stress factors and better- 
ing the level of society at all stages may re- 
duce a myriad of social ills and may even 
prove effective, indirectly, in reducing the 
amount of child abuse. But there still remains 
the problem, insoluble at the demographic 
level, of why some parents abuse their chil- 
dren, while others under the same stress fac- 
tors do not. 

Other authors throughout the decade have 
allowed for the types of services outlined by 
Gil. but less globally and in a manner less 
disregarding of parental personality factors. 
That raising the general educational and 
financial level of families that are socioeco- 
nomically deprived is of long-range value in 
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the lessening of the prevalence of child abuse 
is generally agreed upon, and finds support 
throughout the literature. However, most of 
the authors explicitly caution against con- 
sidering abuse, as does Gil, as a function 
solely of educational, occupational, economic, 
or social stresses. This point is made by Adel- 
son (1961), Allen et al. (1969), Elmer 
(1967), Fontana (1968), Helfer and Pollock 
(1967), Holter and Friedman (1968), Kempe 
(1968), Kempe et al. (1962), Paulson and 
Blake (1967), Silver et al. (1969a, 1969b), 
and Steele and Pollock (1968). 

The great majority of the authors cited in 
this review have pointed to psychological 
factors within the parents themselves as of 
prime importance in the etiology of child 
abuse. They see abuse as stemming from a 
defect in character leading to a lack of in- 
hibition in expressing frustration and other 
impulsive behavior. Socioeconomic factors 
sometimes place added stress on the basic 
weakness in personality structure, but these 
factors are not of themselves sufficient or 
necessary causes of abuse. 


CONCLUSIONS 


The purpose of this review has been to, 
bring together the published professional 
opinions on the psychological characteristics 
of the abusing parent, in order to determine 
from the most commonly held opinions what 
generalizations can be induced, and thus to 
lay the groundwork for more systematic test- 
ing of hypotheses. 

The psychologist, both as a specialist in 
the functioning of the human as an indi- 
vidual, and as a scientist. tr 
methodology, 
the hypothese 


ained in research 
is in a unique position to test 
s raised by professionals in the 
fields of medicine and social work, in the 
study of the personality characteristics of the 
abusing parent, 

Certainly, one would hope th. 


eventually develop criteria 
those inade 


sional help 


at research can 


; Càn meet the needs of their chil- 
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determine after the fact of abuse which 
families must receive the most attention to 
assure the further safety of their child. 
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learning-set formation, the major theoretical explanations of learning-set 
formation in monkeys are analyzed. Studies showing that a reward can func- 
tion to decrease as well as increase the probability of choosing an object cast 
doubt upon theories based on an automatic strengthening function of reward. 
Hypothesis or strategy selection theories avoid this problem by assuming 
hypotheses, rather than responses, are subject to reinforcement principles, but 
hypothesis theories are at best incomplete in their treatment of retention. A 
A theory which assumes that learning-set formation results from between-prob- 
ation of feedback from expected rewards is consistent 


lem stimulus generaliz 
both with retention studies and with experiments on the function of reward 


| Following a review of empirical research on the role of reinforcement in 
" 


in learning set, suggesting that learning-set formation need not be considered 
a complex abstractive process. 


When monkeys are given a series of two- lem is to specify the source of this improve- 
a n Object discriminations, their perform- ment in learning ability. 

9n new problems gets better and better. This study examines the major theoretical 

y start out making 5096 errors on the explanations of discrimination LS after a re- 


e A : 3 "ar i 
a trial of new problems, but after being view of the principal findings on the role of 
Sted on a few hundred problems make only reward in LS in monkeys. The review borrows 

heavily from and supplements the excellent 


© or fewer errors on Trial 2 of new prob- 
5. The monkeys, so to speak, “learn how earlier summaries by Harlow (1959), Reese 
le Tol This improvement is not attributa- (1964), and Miles (1965) 
ihe ? the problems having a common solution, proud PIEBATURE 
© attrib rr vith reward in one 
lem a ie with reward Basic Learning-Set Procedures and Data 
not be c a 
fe Subsequent problem. Nor is this im- 
‘vires simply a matter of the monkeys 
. aioe to the experimental situation—sub- 
diss typically receive extensive pretraining 
ee ting objects and picking up rewards 
he learning-set (LS) training begins. 
Ls ^! that these uninteresting explanations of 


= E " 
Eon be rejected, a major theoretical prob- 


lem 
to |, 


In a typical LS experiment, subjects are 
given a series of simultaneous discrimination 
problems having the following procedural 
characteristics: (a) a pretraining involving 
single objects being displaced for rewards; (5) 
a small, fixed number of trials on each prob- 
lem; (c) a different pair of stimuli for each 
problem: (d) a noncorrection procedure; (e) 
; T. an ee a a reward for every correct response; ( f) an in- 
s mie was supported by ee tertrial interval of 10-20 seconds. The basic 

"Vice. W. K. Sas pera dc qu ded valu- measure of LS performance is improvement in 
? Ra ments on earlier drafts of this paper. within-problem learning as a function of num- 
L. MWütsts for reprints should be sent to Douglas Per of problems given. Subjects cannot im- 


Ney dedin, versity, New York H 
ew aie University, New York, prove their performance on Trial 1 across 
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problems as it functions solely as an informa- 
tion trial. : 
Harlow (1949) performed the first experi- 
ment specifically designed to study between- 
problem transfer in monkeys? In this classic 
LS study, monkeys were tested in 344 inde- 
pendent discrimination problems in a Wis- 
consin General Test apparatus. On the initial 
problems, learning improved gradually across 
trials, and subjects averaged 52% correct 
responses on Trial 2 of the first 8 problems. 
On the last set of discriminations, virtually 
all the learning took place between the first 
and second trials of a problem— subjects 
averaged over 90% correct responses on Trial 
2 for the last 112 problems. This apparent 
change from gradual or trial-and-error learn- 
ing to immediate solution of new problems 
Harlow labeled “learning set formation.” 
Harlow (1950) suggested that the response 
patterns of the monkeys might provide an 
informative supplement to proportion correct 
as a dependent variable. He showed that er- 
rors made during LS were not random but 
represented systematic response tendencies 
that were inappropriate to the solution of the 
problems. The four principal error factors 
identified were stimulus perseveration, differ- 
ential cue, position preference, and response 


shift. A brief description of these errors fol- 
lows: 


1. Stimulus perseveration. Stimulus per- 
severation errors consist of repetitive choices 
of the incorrect stimulus object. These errors 
are attributed to innate or learned preferences 
for, or avoidance of, particular stimulus ob- 
Jects. Stimulus perseveration errors, as mea- 
Sured by runs of consecutiv 
as the LS progresses. 


2. Differential cue, If the correct object oc- 
Cupies the left side of the apparatus on Trial 
1, there is some ambiguity as to whether re- 
Sponses to the left or to the object itself are 
being reinforced. Differential cue errors are 
measured by the excess of errors on the trial 
When the Correct object first shifts position 


ove the errors made on comparable trials 
ae MAR 
8 [3 
às o M monkeys” refers to rhesus monkeys 
PE imately 90% of learning-set studies in 
Sa ave used rhesus monkeys. Departures 
€ use of rhesus are noted in the text. 


e errors, decrease 
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when the correct object has not yet siti 
positions. The number of errors on shift i) 
divided by the number of errors on co 
ponding nonshift trials is called the eee 
ble-trial error ratio. When this ratio is Me 
to 1, presumably no differential cue E. 
occurred. Harlow (1950) also reported ap 
differential cue errors decreased across or. 
and the comparable-trial error ratio a 
proached 1.0. However, Davis, McDov aii 
and Thorson (1953) found that while the 2 
solute number oí differential cue eee UM 
creased across trials, the comparable-tr? 
error ratio, if anything, increased. ences 
3. Position preference. Position ptem 
are consistent responses to the left or Me 
foodwell, regardless of the position T 
correct object. Abordo and Rumbaugh bó ining 
found that squirrel monkeys given tra osi 
such that the correct object switched les 
tions after each correct response perf? than 
better under conventional LS procédure cet 
subjects given conventional training t ences 
out. In general, however, position prele a 
are only a minor source of errors in mon 
LS formation. 


„s are 
5 -rors 4 
4. Response shift. Response shift err”? 


t obj ect 


t ; c 
errors of responding to the incorre laced 0" 
after the correct object has been disp 


as 
. . Mein or N asuret ©" 
previous trials. This is normally mea 


an excess of errors following à eet 
warded trial over the errors following js W 
correct trial. Harlow suggested that , 10 
attributable to the monkeys’ 5 
explore the test situation, that is, the 
alternative. 


. Trial 1 
Learning-Set Performance after Tri 
Reward and Nonreward ya 


n 
rewal | 
On logical grounds reward and ker ually 
on Trial 1 of a problem should hos n wee 
informative—that is, if the object we ee 
rewarded, the subject should cont sf gm 
spond to it; while if the object t object d 
nonrewarded, he should avoid tha corre? 
future trials and choose the ug noe 
object. However, initial reward is on Wi 
ward do not have equivalent effe ir effe 
formance within a problem nor !5 , 
constant across problems. . "T ma 
The following generalization © correct 
Early in training there are more 
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3ponses following a correct Trial 1 response 
than following a Trial 1 error; but later in LS 
training the opposite holds—there are more 
Correct responses following an error than fol- 
ae a correct Trial 1 response (e.g., War- 
E * see also Reese, 1964, for other 
E n exception may be for difficult two- 
nsional pattern discriminations, in which 
Eug Stimulus preferences appear to occur, 
à Which correct first-trial responses are 
more efficient in reducing errors throughout 
S training (e.g., Leary, 1958a; McConnell & 
chuck, 1962). 
1 x er à single object is presented on Trial 
either rewarded or nonrewarded before 
m. "Choice trials begin, a different picture 
Tes. This procedure has the advantage of 
"éaking the correlation between stimulus 
references and Trial 1 choices. With single- 
imulus training one finds that throughout 
training, Trial 1 nonreward uniformly re- 
mis errors more than does Trial 1 reward 
be [1965], 1966; Fletcher & Cross, 
Schw see Reese, 1964, for earlier work; 

Wartzbaum & Poulas, 1965). 

Wo possible confoundings present in stud- 
Viate loving single-stimulus pretraining ob- 
effect; rawing any firm conclusions as to the 

lVeness of reward and nonreward. First, 
mem Is the possibility that adapting animals 
m : test Situation by having them displace 
8 € objects for rewards prior to LS training 
Disses the monkeys not to attend to a singly 
Cites ted. rewarded object. Harlow (1959) 
a an unpublished study by Schrier and 

OW in which the difference in performance 
teen initial reward and nonreward (favor- 
t Donrewarq) was five times greater than 
mp, St 10% difference, implying that 
eys learned virtually nothing following 

Initial rewarded response. The only differ- 

* in procedure was that the Trial 1 object 
otp,, Placed over the center foodwell which 

aly ve, WAS used solely for adapting ani- 
tha to displace objects. ‘This study suggests 
ab] Processes related to attention are prob- 
k - Producing effects during single-stimulus 
Ta N. Boyer. Discrimination performance in 
cw ees of monkeys as a function E. m : 

. Contingency, i ial interval, and prior tes 
hog rente, Unmutabed doctori disertstion, Okla- 

"ate. University, 1965. 
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pretraining which make the relative effective- 
ness of reward and nonreward difficult to 
evaluate using this procedure. A second 
shortcoming of the single-stimulus pretrain- 
ing paradigm is that responses to novel ob- 
jects have differential effects on performance 
following rewarded and nonrewarded first 
trials. Any tendency to respond to novel 
objects would increase the estimate of amount 
learned from a nonrewarded first trial and 
decrease the estimate of the amount learned 
from a rewarded trial. As the following sec- 
tion of this study indicates, novelty and 
familiarity can be distinctive cues for mon- 
keys. 


Effects of Novelty and Familiarity 


Test-experienced monkeys appear to retain 
a tendency to approach novel stimuli, but 
under conditions favorable to generalization 
from the positive to the negative stimulus 
(e.g, the negative stimulus has not been 
chosen and nonrewarded), the tendency to 
approach familiar, negative stimuli becomes 
stronger than the tendency to approach novel 
objects (Behar, 1962a, 1962b; Leary, 1956; 
see also Reese, 1964). 

Monkeys can solve two-trial LS problems 
when the correct solution is to choose a newly 
presented Trial 2 object and to avoid either 
the correct or incorrect Trial 1 object (Brown, 
Overall, & Blodgett, 1959). Monkeys can also 
learn to approach or avoid specific recurring 
stimuli from previous problems (Gentry, 
Overall, & Brown, 1958) even on Trial 1 of 
new problems (Riopelle, Chronholm, & Addi- 
son, 1962). 

Cross, Fletcher, and Harlow (1963) showed 
that the cue of familiarity can be established 
from home-cage experience with objects. For 
one group (positive) the home-cage objects 
were designated as correct in the test situa- 
tion, for another group (negative) the ob- 
jects were always incorrect in the experiment, 
and for the third group (mixed group) the 
home-cage stimuli were arbitrarily designated 
as correct and incorrect. A control group re- 
ceived no home-cage experience with the ob- 
jects. The positive group and the negative 
group performed significantly better than 
chance on Trial 1 of the problems involving 
one home-cage stimulus, showing that they 
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were able to use familiarity as a cue. The 
negative group performed better than the 
positive group, suggesting that subjects had a 
tendency to approach the novel objects. The 
mixed group did not differ from control sub- 
jects on Trial 1 of new problems and were 
actually inferior to them on Trials 2-12 (see 
also Shell & Riopelle, 1958). 

In summary, novelty and familiarity can 
be and are effective cues for monkeys and, as 
Such, complicate inferences concerning the 
effectiveness of reward and nonreward. In the 
following section, we see that experiencing a 


diversity of problems is important in LS for- 
mation. 


Problem Diversity 


Schusterman (1962, 1964) demonstrated 
that chimpanzees given repeated reversal of 
a single discrimination problem show immedi- 
ate efficient Trial 2 performance when 
switched to LS training. Schusterman’s apes 
seemed to transfer a generalized strategy of 
win-stay, lose-shift with respect to objects. 

Monkeys in contrast to apes may require a 
more diverse selection of stimuli to form an 
LS. Riopelle (1953) reported that giving 
2,000 training trials on six object discrimina- 
tions followed by conventional LS training re- 
sulted in LS performance similar to that of a 
group of monkeys that had not received prior 
training. Treichler (1966) trained monkeys 
for 840 trials on two discrimination problems 
and found that an immediate win-stay, lose- 
shift strategy did not result, The modest trans- 
fer observed in his experiment may be at- 
tributable to specific interproblem stimulus 
generalization. 


A study by Riopelle (19552) suggests that 
the minimum number of different stimuli 
needed for LS formation may be quite small. 
Five naive monkeys who had learned 10 pre- 
liminary discriminations were trained to a 
criterion of 5 trials in a row Correct (or a 
maximum of 50 trials) on two-choice problems 
Consisting of the various combinations of four 
different objects. They received 18 scrambled 
Tepetitions of each of the 12 possible combi- 
nations of the four stimuli, or a total of 216 
Problems, before being shifted to conventional 

E Since all the combinations of the four 
objects were used, subjects received consider- 
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able reversal learning experience. The mon- 
keys showed immediate excellent LS perform- 
ance on the conventional six-trial problems. 
On the other hand, Riopelle and Moo 
(1968) found that diversity of problems E 
hances the LS formation. Two groups of M. 
stumptail monkeys were given 10 a 7 
problems each day of training consisting T ne 
familiar (repeated) and 3 new problems. E 
predictable reward group had the same o. 
designated as correct for the repeated P 
lems, and the unpredictable reward group th 
reward randomly assigned to objects E 
repeated problems. For a control group 2 dif- 
problems each day were new. After - 
ferential pretraining, all three groups stim- 
given 50 six-trial problems with all new ly 
uli. The predictable reward group e rial 
reached perfect performance on the firs obi 
of recurrent problems. On transfer to LS P v. 
lems that employed all new stimuli, im han 
trol group performed significantly bette 4 less 
either of the other groups having rete ee 
diverse stimuli. This suggests experienta : 
variety of stimuli is important for n e sU 
lishment of LS in monkeys (see also t! 
tion on transfer of learning sets). 


Retention of Discriminations ead 
LS 


n of indi- 
arlow* 


Most theoretical descriptions of 
one to expect relatively poor retentio 
vidual discriminations. According to 


. crimination” 
By the time the monkey has run 232 discrim d rea 
and followed these by 112 discriminations cc pab- 
versals, he does not possess 344 or 456 me ub h 
its, bonds, connections, or associations. V ith mu 
our monkeys at this time could respond sii o pu 
more than chance efficiency on the first ui t n 
series of previously learned problems. Jearn xU 
monkey does have a generalized ability xd P. 
discrimination problem or any discrimi parlo" 
versal problem with the greatest of ease 


1949, p. 63]. ter 


1. Intertrial interval. Relatively ee as 
retention of object discriminations {interv 0 
sessed by studies varying intertria and 19” 
For intertrial intervals between o ines v 
Seconds, performance typically ds rch 
though only modestly (Boyer, Ld ^ bs 
& Cross, 1964; Harlow, 1959; Td ; 1953 
ren, 1952; Kruper, Patton, & Kos sample E 
Riopelle & Churukian, 1958). For € here ™ 
the Riopelle and Churukian study 


] 


, about a 5% decrease in performance as the 
. Mtertrial interval increased from 10 to 60 
- Seconds. 

2. Concurrent versus consecutive problem 
Sequences. Typically, a series of problems is 
Presented to a monkey such that he receives 
all the predetermined trials on one problem 
nd being given the next problem. In the 
M Urrent procedure the problems are pre- 

"ted in a paired-associate fashion so that 

Nal 1 of each problem appears before Trial 

j- E. Fan problem is given. The concurrent 
"m x ure may involve both interference from 
Um she in the list and increased forgetting 
Pei Ber interpresentation intervals. As 
tatio hight expect, consecutive stimulus presen- 
Sire. results in better performance than con- 
iu (Darby & Riopelle, 1955), and con- 
; Crease list performance decreases with in- 
Ss S in list length at least for squirrel mon- 
ees ine & Goodman, 1966) and chimpan- 
ES ayes, Thompson, & Hayes, 1953). The 
Trent presentation procedure on the 

a hand is not especially difficult for rhesus 
ig (e.g., Leary, 1958a, 1962), and re- 

n of a well-learned list is excellent at a 

A Our retention interval (Sledjeski & 
ench, 1968), 
Bi rer Ong-term retention. Monkeys are able 
e, 4n not only object discrimination LS 
l d. Braun, Patton, & Barnes, 1952) but 
ha Specific object discriminations. Monkeys 
een found to retain discriminations 
RT Perfectly for at least 24 hours rin 
lop n SIX acquisition trials per problem 
| 1962)" e & Moon, 1968; Riopelle et al., 
^ thor, Mason, Blazek, and Harlow (1956) 
b ed above-chance first-trial performance 
Objecns monkeys on a series of 90 six-trial 
Honan discriminations which were uninten- 
Stron Y repeated after a 1-month interval. 
tensi 8 (1959) gave four naive monkeys ex- 
je, € training on 72 pairs of stimulus ob- 
(996,224 found extremely good performance 
30 77 Correct) at retention intervals between 


te 


lt 
S rs a 
Ax g S x 


Virtual 
s 


n 
Dorte 210 days. Zimmermann (1969) re- 
lor e about a 15 % retention loss in monkeys 


Which cycle of 100 discrimination problems, 
Sith Were repeated every 20 days. By the 
A a. cle, Tria] 1 retention was 8376 correct. 
Yole Onth retention interval before the last 
“sulted in a 20% memory loss. 
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In a series of studies of retention in LS- 
experienced monkeys, Bessemer found effi- 
cient performance on object discriminations 
over a 24-hour retention interval but that, 
strikingly, the retention loss appeared to be 
confined to those problems on which the first 
trial response during training had been in- 
correct and nonrewarded. Since subjects pre- 
sumably responded to their preferred stimu- 
lus on Trial 1, stimulus preferences can be 
pinpointed as playing a significant role in 
retention; when the preferred stimulus is 
correct, retention performance is better. Even 
if the associative information were lost, stimu- 
lus preferences would insure good retention 
performance when the preferred stimulus is 
correct. In an experiment that eliminated 
stimulus preferences by using single-stimulus 
presentations, the differential retention effect 
disappeared. Bessemer’s finding that stimulus 
preferences were strongly influencing retention 
is of considerable theoretical importance since 
the manifestation of stimulus preferences in 
these LS-experienced subjects would perhaps 
be unexpected. 

4. Transjer suppression and interproblem 
interference. Riopelle (1953) proposed that in 
the course of LS, monkeys suppress transfer 
of inappropriate problem solutions and treat 
successive problems increasingly indepen- 
dently. Eventually, according to the transfer 
suppression theory, responses to a particular 
problem do not transfer to succeeding prob- 
lems. To assess this idea, Riopelle tested naive 
monkeys on six problems each day for 63 days, 
with the sixth problem being a reversal of 
either the first or fourth problem of that day. 
Initially the subjects made 80% errors on 
the first trial of the reversed problems, but 
eventually their performance reached 60% 
errors on the first trial of reversals and was 
not different from that on nonreversal prob- 
lems on Trials 2-6. 

However, these results do not imply that 
LS is accompanied by increased forgetting 
about specific object discriminations, As 
Stollnitz and Schrier (1968) pointed out, LS 
formation could not have been based on the 


ES 
5 D. W. Bessemer. Retention of object discrimina- 


tions by learning set experienced monkeys. Unpub- 
lished doctoral dissertation, University of Wisconsin, 


1966. 


310 


development of stimulus independence, since 
the monkeys in Riopelle's study had largely 
formed their LS before they started to sup- 
press transfer. Schrier and Stollnitz tested 
LS-experienced monkeys for transfer suppres- 
sion by replicating Riopelle’s (1953) pro- 
cedure. Their monkeys averaged only 1765 
Correct responses on Trial 1 of the reversals. 
Transfer suppression had not resulted from 
their prior LS training. A second experiment 
employed LS-experienced stumptail monkeys 
using Riopelle’s procedure for a more ex- 
tensive testing period. Trial 1 performance on 
reversals was initially poor and, surprisingly, 
did not improve with practice. A replication of 
Schrier (1969) employed both rhesus and 
stumptail monkeys, and again stumptails 
failed to improve on Trial 1 of reversed prob- 
lems over a 13-week practice period. The 
rhesus monkeys did improve (showed transfer 
suppression), and one subject performed sig- 
nificantly above chance on Trial 1 of the 
reversals. This above-chance performance is 
contrary to the transfer suppression proposi- 
tion of problem independence. The pattern of 
results supports the contention that improve- 
ment in performance on Trial 1 of reversals 
that are interspersed during LS training rep- 
resents learning to reverse when familiar 
stimuli reappear rather than any forgetting 
phenomenon such as implied by transfer sup- 
pression theory. There is independent evi- 
dence that monkeys can learn to reverse a 
Previous choice after an 


arbitrary signal 
( Riopelle & Copeland, 1954) 


Transfer of Learning Sets 


Learning sets may facilitat 


€ or hinder per- 
formance on differ 


ent types of stimuli or 
Problems. Wilson and Wilson (1962) reported 


à small but reliable transfer between visual 
see von Wright, 1970, for a 
modal transfer), Harlow and 


LS problems, 
discrimination prob- 
on Trial 2, but the 
r than the color-dis- 


Performance on the form- 
lems Started at chance 
orm LS developed faste 
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crimination LS. King (1966) found E 
concept-formation training (i.e., Canea ) 
regardless of form) transferred to vien E. 
crimination LS better than training ED 
rewarded presentations of single objects. oes 
baugh, Sammons, Prim, and Phillips ( pre- 
reported that giving squirrel E 
training for 3,000 trials with either a ndi 
stimulus or with 500 different ur 
a 50% reward schedule retarded LS a Da 
tion. Interposing a task where monkey Ee 
formance does not rise much above dissi] 
(such as double alternation) also d prim, 
object-discrimination LS (Rumbaugh & 
1964; Warren & Sinha, 1959). "n 
Particular theoretical significance al 5 
to transfer between object-discriminatt™ for 
and reversal LS. In terms of iscing: n 
problem solution, both procedures a 
solution win-stay, lose-shift (with rest might 
objects), and on some grounds = Chine 
expect perfect transfer between then er from 
panzees show immediate direct M ee pairs 
repeated reversals involving just à i i 
of stimuli to object LS quu acili- 
1964). Repeated reversal learning a a h: 
tated the LS formation of Philippe all 
stumptail monkeys but to a peche re 
extent (Schrier, 1966). Even repes acili- 
versal of a position discrimination monkeys 
tate object-discrimination LS in ! 
(Warren, 1966). jal of re 
If the nonreward on the first iri 5 
versal comes to act as a cue ar “x nig 
versal, then object-discrimination ten that | 
facilitate reversal LS only to the jor the 
efficient prereversal performance FX 4 dis 
nonreward on the first reversal tré o sut 
tinctive cue. The results of a number ° ^ jos 


aches 


à ge 
les with a variety of primate Qoo es 
that object-discrimination LS faci is bY A 
versal LS but that the transfer & 10 d 
means complete or perfect (C o hi 
1965; Harlow, 1944, 1950, pd P 
1951; Rumbaugh & Ensminger, aim n E. 
1966). In other words, transfer | c simpl? 
ject and reversal LS does not enn 
employing a win-stay, lose-shift T 
Information Value of Rewards gb) | 
1. Partial reinforcement. Beba! key? ih 


arie mon 
gave naive and sophisticated 


EF 
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Erie between receiving two units of reward 
M pon time or one reward 100% of the 
Sis 1 aive subjects showed no preferences 
ME he test-experienced monkeys chose the 

Sistently rewarded object suggesting that 


no) e DRE 
. ‘onreward had a greater effect on the sophisti- 


cated than on the naive monkeys. 
Evan (1963) . tested LS-experienced 
B Uum conditions where correct re- 
ipn Ware rewarded with various probabili- 
red within-subject design used reward 
Be ca 100%, 75%, 50%, and 25%. 
im ee performance was efficient only 
u ma percentages of 100% and 75%. 
E. other condition, one of two colors was 
aled when the objects were displaced. 
ese color markers followed the positioning 
Ei nest and incorrect objects and thus 
icular erve as secondary reinforcers. The par- 
Noble used changed from problem to 
Useful ^ so the color information could be 
Bde: only after the first rewarded trial. 
z this procedure, performance on the 
tantly d 50% conditions improved signifi- 


simil Digtons problems. Consider three 
its » A, B, and C, that are presented in 
& Eg, Bling either of A and B, with A 
the tobe B and C, with B correct. To solve 
With em, B must be chosen when paired 
. 5, aNd must be avoided when paired with 
*rformance on AB trials versus BC trials 
Es ih to estimate the relative effects of 
BC tri and nonreward. The data reveal that 
AB "ie typically result in fewer errors than 
ars a for objects, while the reverse ap- 
(Bern i hold true for two-dimensional stimuli 
logg Stein, 1961; Fletcher, Grogg, & Garske, 
fortun Leary, 1958b; Thompson, 1954). Un- 
A inferences concerning the effec- 
tucia] of reward and nonreward „depend 
abse Y upon assumptions concerning the 
katian 9r presence of within-problem general- 
been N so that no strong conclusions have 
drawn, 


iig that when the stimuli are separated 
ity Xe site of the monkey’s response by 
NM Minimal distance, discrimination per- 
ftom ae is severely retarded. It would follow 

1S that monkeys might learn only 
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about the object chosen in a discrimination 
problem. Lockhart, Parks, and Davenport 
(1963) used test-experienced pigtail monkeys 
to directly test this proposition. For one group 
of animals, the object not chosen on the first 
trial was replaced for Trials 2-6 oi six-trial 
problems, while for the other group the dis- 
placed object was replaced for Trials 2-6. The 
reward conditions were dictated by and con- 
sistent with the first trial outcome. The sub- 
jects having the unchosen object replaced 
were correct on 85% of their Trial 2 choices, 
while the subjects having their chosen object 
replaced performed at chance level, regardless 
of whether their first trial was correct or in- 
correct. 

Learning is not solely restricted to the ob- 
ject chosen. Brown and Carr (1958) placed 
the objects for the next problem 6 inches be- 
hind the objects being used for the current 
problem. The object which was to be correct 
on the following problem was always behind 
the object which was currently correct. Their 
monkeys showed significantly better-than- 
chance performance on the first trial of the 
new problems, indicating that they had 
learned something about the incidental cues 
(see also Davis, 1965 and Zeis [1964],* for 
other incidental learning studies). 

There is some evidence that with extended 
practice, animals can learn something about 
the unchosen object. Bowman and Takemura 
(1966) also used the procedure of replacing 
either the chosen or the unchosen Trial 1 ob- 
ject which could be rewarded or nonrewarded. 
For one group the recurring object was always 
correct, while for the other the recurring ob- 
ject was incorrect. When the chosen object 
was brought forward, both groups were able 
to master the problems. The group having 
the unchosen object brought forth as correct 
learned much more slowly but eventually did 
master the problem. One might argue that 
subjects were using familiarity as a cue, since 
the recurring object was always correct. Even 
so, this would indicate that familiarity with 
objects can be acquired without directly dis- 


placing them. 


6S, M. E. Zeis. The use of peripheral cues in 
learning set formation by rhesus monkeys. Unpub- 
lished doctoral dissertation, Catholic University of 


America, 1964. 
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'The monkeys having the unchosen object 
brought forward as incorrect immediately 
reached a level of 7096 correct on Trial 2 and 
did not appear to improve their performance. 
Note that one need not invoke any associative 
process to explain this result if stimulus pref- 
erences are assumed to occur. Variables lead- 
ing to not choosing the object on the first 
trial may also operate on Trial 2 to produce 
7076 performance even in the absence of 
learning. 

Fletcher and his associates (Fletcher, 1966; 
Fletcher & Takemura, 1965; Fletcher et al., 
1968) have employed a prompting procedure 
where a cue (i.e. the prompt) is attached to 
either the correct (positive prompt) or in- 
correct (negative prompt) object. Groups are 
given pretraining with either a positive or a 
negative prompt to teach the monkeys the 
significance of the prompt. As a result, few 
errors occur on prompted trials during the 
main experiments, Learning is assessed by 
trials in which the prompts are removed; and, 
in general, negative prompting results in bet- 
ter performance than positive prompting. Re- 
placing the correct object with a new object 
on transfer tests after negative prompting 
trials shows that monkeys apparently learn 
about the incorrect object on the basis of its 
previous pairings with the negative prompt, 
even though they have never chosen it. 

4. Control by positive and negative cuc. 
French, Birnbaum, Levine, and  Pinsker 
(1965) tested monkeys under the following 
experimental procedures: (a) both the correct 
and incorrect objects were constant; (5) the 
Correct object remained the s 
incorrect stimulus was consta; 
(c) the correct object rem 
and new incorrect stimuli were introduced; 
(d) both new Correct and incorrect objects 
were introduced for each problem. The first 
condition yielded the best performance, the 
next two conditions had intermediate effects, 
and performance was least efficient under the 
fourth condition. In a somewhat related ex- 
Periment Ettlinger (1960) found for both tac- 
n m visual discriminations that separat- 

5 the correct cue by 2 inches from the re- 
Sponse site was more disruptive than separat- 


Ing the Incorrect object from the normal re- 
5ponse sites. 


ame, but a new 
ntly introduced; 
ained the same, 
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By using an automated apparatus m E. 
dimensional stimuli, Sheridan, Horel, NS 
Meyer (1962) were able to allow a Ü 
correct, incorrect, both, or neither P. E 
persist for certain time intervals a ; | 
animal's choice response. Persistence 0 dd 
correct stimulus aided performance ee oí | 
persistence of the incorrect stimulus, €. 
which were better than either of the ot 

nditions. 

i Reward as cues. Apart oe, E 
strengthening function, rewards can se 
| 
| 


any | 


js in- 
cues. A suitable paradigm to show iH 
volves two-trial problems where EU Tri 
response on Trial 2 is determined by roblems | 
1 outcome. That is, one can give P 


x + chift, 105% 
having the solution win-stay, win shi objects: 
stay, or lose-shift with respect tO th type 


The combination of the first and To ject-dis 
of problem is simply the usual pL hird 
crimination LS, but the second a aban 
type of problem require the sühjeci oe or t0 
don responses to the just-rewarded ite none 
persist in responding to a cue desp! je ro" 
ward, respectively. The four types twee 
lems are typically administered in : the 100 
subjects design where only one 0 roup ^ 
problems is presented to a given ate 
monkeys. For example, if a group 15 lways s; 
win-shift, its Trial 1 response will d sel 
rewarded, and on Trial 2 subjects T res 
the previously unchosen object. ide 
of a number of studies (Brown, Lam 
Gaylord, 1965; McDowell & BT ^ 
1963b, 1963c; McDowell & ge , Gal 
1965b, 1965c, 1965d, 1965e; McDo? Reest 
lord, & Brown, 1965a, 1965b; > all ite 
1964) show that monkeys can lear j a 
solution types. Win-stay is by p ral 
difficult of the four solutions to A view th 4 
sult puzzling from the point O gest ME 
rewards directly strengthen respo 965: va 
few experiments (Brown et al., z wa e 
Dowell et al, 1965b), win-stay to n kis {0 
learned at all. Perhaps responses re d 
and overall reward probability pei 
this result, but there is at press uin 
explanation. Within-subjects p ay © 
compatible problem types (e.g« ee m 
bined with lose-stay) would e cida 
reward probabilities and might € sho" 
ters. Finally, Behar (1961a) has 
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Monkeys can form an object-alternation (win- 
shift, lose-stay) LS. 
ae. in a series of studies (Riopelle, 
. ae iopelle & Francisco, 1955; Riopelle, 
as a n & Ades, 1954), has used a marble 
giv rial 1 signal. Six-trial problems were 
en with no food, food, or a marble on Trial 
po amg the chosen or unchosen object to 
E a and rewarded with food in Trials 
End Is nder these conditions, where win-shift 
Jof a apod were not rewarded on Trials 3-6 
x El e win-stay and lose-shift were 
- (see Ble ed better than win-shift and lose-stay 
. interesti, Schwartzbaum & Poulas, 1965 ). An 
group. ing feature of Riopelle’s data is that 
Eon trained on marble-stay; if anything, 
ay n better than groups trained on food- 
cient x may be that food cues are less effi- 
istract an marble cues since the food may 
sen the animals’ attention from the 
A monjea. These experiments show that 
Uncti tial 1 cue (reinforcer), a marble can 
th On as effectively as food reward, and 


Y su E 
tewarg Port emphases on the cue value of 
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NALYSIS AND REVIEW OF THEORIES 


Thi à 

Ve ÎS section reviews several theories that 
a7 dressed themselves to the question of 
S learned during LS formation. 


Mog; 

°dified Hull-s pence Theory 
of X586 (1964) suggested a modified version 
for ull-Spence theory that might account 
rly ; formation. It correctly predicts that 
fut fe Ls training a success will reduce 
lj ¢ errors more than an error, while later 


raini 
j Nore Ning an error will reduce future errors 
um an a success. Reese also drew the 


mygg ations that the stimulus preferences 
for na Overcome before the efficient LS per- 
eater © can occur, and retention should be 
Bt o early in training than later. Doubt is 
(seg 5 the former proposition by Bessemer’s 
sen e. hote 5) finding that stimulus pref- 
s d " p in LS-sophisticated animals 
Deci i oe latter by the efficient retention of 
ire , Scriminations and the corresponding 
| ty Mos e obtain transfer suppression. , 
tig Sur undamentally, one might question 
h g Mption of a direct strengthening func- 
rewards, We have seen that monkeys 


h 
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can learn problems requiring the animal to 
avoid the object just chosen and rewarded 
(ie. win-shift) and to approach an object 
that had just been followed by nonreward 
(ie., lose-stay) (e.g., McDowell et al., 1965a) 
and that a marble is as reinforcing as a raisin 
(e.g., Riopelle et al., 1954). 


Hypothesis Theories 


The previously mentioned difficulty with 
Reese’s modification of Hull-Spence theory is 
resolved by hypothesis theories. They view 
LS formation as a more abstract process 
in which hypotheses rather than single re- 
sponses are regarded as subject to principles 
of reinforcement. Strategies or hypotheses 
rather than single responses may incorporate 
Trial 1 outcomes as cues, and thus a win-shift 
hypothesis is just as possible as a win-stay hy- 
pothesis. 

1. Harlow’s error-factor theory. Harlow's 
error-factor theory is a uniprocess theory, and 
only one basic learning process, inhibition, is 
assumed to occur. It is assumed that the cor- 
rect response is immediately available but 
must compete with many inappropriate re- 
sponses in the learning situation. Formation 
of a LS, according to the theory, occurs when 
the monkey has eliminated these error factors 
or inappropriate response tendencies. 

Error-factor theory is not fully developed 
since specific assumptions and postulates have 
not yet been presented. For example, while 
transfer suppression would seem to be con- 
sistent with error-factor theory, it is not ob- 
vious that lack of suppression constitutes evi- 
dence against the theory. The various error 
factors do not completely disappear during 
LS formation (e.g.. Bessemer, see Footnote 5; 
Davis et al., 1953), but the implications of 
this are unclear because the proportion of 
responses controlled by a given error factor 
presently cannot be determined, and conse- 
quently the strengths of the various error 
factors cannot be compared. 

2. Levine’s hypothesis behavior model. 
Levine's model (Levine, 1959, 1965) differs 
from Harlow's in several fundamental ways. 
The model includes both error-producing and 
reward-producing response patterns, hence 
the term hypothesis rather than error factor. 
Formation of LS occurs by the strengthening 
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of the correct hypothesis by 100% reinforce- 
ment and the gradual extinction of other hy- 
potheses because of 50% reinforcement. All 
response patterns are measured, and from 
this the proportion of the time a specific 
hypothesis is used can be estimated. The de- 
tailed hypotheses are specific enough so that 
by some algebraic manipulations Levine can 
derive the proportion of time each of these 
hypotheses was used in a set of data. The 
model was tested (Levine, 1959) by observing 
some proportions of response sequence pat- 
terns and then predicting the proportion of 
occurrence of other response patterns. In gen- 
eral the agreement between predicted and ob- 
served proportions of responses is quite good. 
Interestingly, with monkeys, nonzero estimates 
of hypotheses seem to occur only for position 
preference, stimulus preference, the correct 
solution,” and random or residual responding. 
At present, the model says nothing about re- 
tention of specific information except that 
hypotheses, and therefore memory concerning 
stimulus properties, persist for at least three 
trials. 

3. Restle's mathematical model. Restle’s 
(1958) explanation of LS formation empha- 
sizes the cue function of rewards and centers 
importantly on an assumption of abstract 
cognition. The theory assumes three classes of 
cues are available during LS formation. There 
are Type a cues, which are relevant and com- 
mon to all problems of the experiment; Type 
b cues, which are relevant. within any one 
problem but which are not valid across prob- 
lems; and Type c cues, which are not valid at 
any time. The Type a cues are abstract, and 
LS formation is seen to involve learning to 
Use Type a cues and to ignore or adapt out 
invalid cues (Type c) and those valid within 
individual problems (Type b). 


Restle cast his theory 


in mathematical form 
and 


discovered parameter values which al- 
lowed him to accurately describe both intra- 


and interproblem learning curves of data from 
previous LS studies. 

* Bowman (Bowman, 1963 
uggested separat: 
ch other since t 


; Bowman & Takemura, 
ing win-stay and lose- 
he two components may 
man's analysis is similar 


enou; i i 
gh to t is not given separate 
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Since Type b cues get adapted, it i. 
crucial for the theory to predict o we 
pression, and, unfortunately for the theo» 
transfer suppression does not occur. nodels 

A question applying to all three A dil 
concerns the interplay between abstrac Hy- 
or rules and specific stimulus properties. con] 
pothesis models principally have = nting 
cerned only with describing or CU an 
differences in performance between initi npha- 
terminal states of learning with little emP. 


n. 
ES eae ttentio” ; 
sis on transitional functions. poker. 
has been directed to ways in whic eate ill 


processes studied in other contexts OI 
LS formation. The escape from the € E 
of an automatic strengthening em 
reward appears to have been at m 
of a loss of contact with such mt 
areas of learning theory as stimulus ae 1 
zation and retention, The model discus m 
the next section ties LS SERES ad js 
closely to specific stimulus properties specif? 
consistent with efficient retention 0 
discriminations. 


mptio? | 


mu 


" torn 
Feedback Theory of Learning-Set F 


ion P^. | 

The reader is referred to a compar, of 
per ê by this author for a full — 9 
the feedback theory of LS formati » mode 
theory is an elaboration of a pepe suc 
(Estes, 1966) which has already se gar | 
cessfully applied to reward magnitu singh 
ing by monkeys (Meyer, LoPopolo: scans 
1966). The main feature of x^ sub. "a 
model is that on a given trial, t^ eed D2” 
scans the available cues, generates : r ef 
(covert prediction of reward value p he f 
cue, and makes the response a 
dicts will yield the highest feedback. prop?” 

Choices are controlled by stimu “i M 
ties of cues (i.e, their salience) pet B ir 
pected feedback from previous Hm r " 
wards affect choices by the facilita n cd 
hibitory feedback their anticipalO” . y 


ning pg 
but they do not directly affect i tt 
applied to LS, the model assun or ont 
wards and nonrewards increase ate Fi ye" 
the expected feedback or antiOD® ial © 
value for the cue to which the aici 
5D. L. Medin. A feedback model for ped as à 


A 1 2 npub! 
tion learning set in monkeys. UnP 
script. 


DISCRIMINATION LEARNING SET 


j E An individual problem is solved by 
E ads e selection of valid cues rather than 
e of irrelevant cues. For object- 
monkey ion LS, the model assumes the 
object ur ors to one of four cues: (a) one 

Bun" ) the other object, (c) the left po- 

and (d) the right position. 

S pe soe ds LS improvement is assumed 

Emus | m stimulus generalization of 

requires rom expected rewards. This idea 

» tion pps: explanation, since. the sugges- 

| Problem si is mediated by specific between- 
casually ái imulus generalization was rather 

-Céviey, EM in the introduction to this 
feedback Dea to the model, transfer of 
Mysterion "p anticipated rewards does not 

the object y become associated solely with 

. Positive f to be correct on future problems. 
With Pon eds from rewards associated 

idle 4 cues on previous problems is 
€ futur by stimulus generalization to both 
ough = Correct and incorrect objects. Even 
the opi t differential advantage accrues to 

Scanning to be correct on new problems, the 

Ore efi Or cue-selection process functions 

nt. ciently when initial feedback from 

Biostar’ rewards is high. Given high initial 

focus lon of rewards, the monkey is able to 

ay bs the correct cue almost immediately, 
When me discrimination is solved quickly. 
and ler feedback associated with the correct 
Nees eet stimuli is low, even large differ- 
Re ds feedback for the correct and incor- 
On «s do not lead to efficient performance. 
be a, 280M for this is that both cues would 
! tion A disadvantage in competing for atten- 
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í leedbach Other cues in the situation. When the 
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D . 
t lon erfectly, High initial feedbacks, in 
e dy to placing certain cues at a competi- 
^ a : 
s~ tage, allow the scanning process to 
tap, wey, nicah 
er, i ! 
‘ch Strength in general, reward does not automati- 
Xen Might en responses since a decrease in feed- 
«Wi trained ollow reward after the monkeys have 
“shig that nonreward follows reward as on 
Problem, 
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perform more efficiently. This ability to effi- 
ciently focus on, or so to speak, home in on, 
cues with high feedback is a property of many 
stochastic learning models. 

Most of the detailed predictions of the 
model were drawn from a computer simula- 
tion. The model is able to generate typical 
between- and within-problem LS functions. It 
correctly predicts that early in LS training a 
correct response reduces future errors more 
than an error, while late in LS an error re- 
duces future errors more than a correct re- 
sponse, In addition, data produced by the 
model showed a pattern of error factors quite 
similar to that of real monkeys (Davis et al., 
1953). 

Specific criticisms of the feedback model 
await its future development and testing. If 
the feedback model for discrimination LS 
were to receive consistent support, it would 
have major implications for what monkeys 
learn during LS training. Historically, LS for- 
mation has been considered to be a complex 
abstractive process involving a conceptual 
understanding on the part of the monkeys. 
The feedback model does not assume any such 
abstract cognition and suggests that specific 
stimulus properties play a major role in LS 
formation. The feedback model suggests that 
the monkey does not so much learn how to 


learn as learn what to expect. 
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EXPLANATION OF REWARDS THAT DO NOT 
REDUCE TISSUE NEEDS' 
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Theories offered for the variety of non-need-reducing stimuli which elicit the 
organism's interaction and serve as rewards are evaluated, including the theoreti- 
cal analyses of Berlyne, Dember and Earl, Fiske and Maddi, Fowler, Glanzer, 
Jones, McClelland and Clark, Montgomery, Myers and Miller, Premack, and 


discussed. 


hes early 1950s, researchers began pay- 

dm erai attention to incentives that have 

uncti ent Ussue-maintenance or reproductive 
j on. Animals were found to learn tasks 
à creased, for example, a moderate in- 
Donny) the intensity of dim illumination, a 
alley we sweet liquid, or a novel maze 
ion th hile some theorists retain the assump- 
i at the reward value of all such stimuli 
ie result of prior pairings with reduction of 
Predor needs or with sexual motivation, the 
there minant view today seems to be that 
iea, a variety of stimuli whose reward 
Needs T oe unrelated to the reduction of tissue 
es e E sexual motivation. This study evalu- 
te cin anations for the variety of non-need- 
i — stimuli which elicit the organism's 

ton and serve as rewards. 


m DEPRIVATION-SATIATION ANALYSES 
tmy, "TW 3 
lus-Satiation Theories 


expiring to a number of early analyses of 
Berlyn Ty behavior and curiosity (e.g., 
195 ha 1950; Glanzer, 1953; Montgomery, 
Sivenes Rothkopf & Zeaman, 1952), respon- 
lon, SS to a novel stimulus decreases the 


the organism's exposure to it, and in- 
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Sheffield. Some of the conceptual, empirical, and methodological issues, which 
seem most basic at this time for testing and extending these theories, are 


stated the hypothesis in a very general way, 
applying it to azy stimulus eliciting the orga- 
nism's interaction. Montgomery (Montgomery, 
1954; Montgomery & Segall, 1955) demon- 
strated that not only will a novel maze alley 
elicit exploration but responses will be 
strengthened which afford access to the alley. 
Montgomery's analysis implies that increased 
familiarity with a novel stimulus will serve to 
decrease both the amount of interaction elic- 
ited by the stimulus and the stimulus’ reward 
value. The general proposition which these ac- 
counts suggest is that any stimulus eliciting 
the organism's interaction is subject to de- 
privation-satiation effects and related changes 
in reward value, Examination of this simple 
hypothesis will provide a useful preliminary 
to discussion of higher-order explanations for 
primary rewards that do not reduce tissue 
needs. 

Research during the past 20 years has 
shown that responsiveness to, and for, many 
kinds of stimulation, besides the traditional 
examples of food and water, does indeed 
change with the degree of prior availability of 
the stimulation. Lengthening the interval be- 
tween successive stimulus exposures has been 
found in rats, monkeys, and other species to 
increase the probability of such behaviors as 
visual exploration of novel stimuli (Butler, 
1957: Rabedeau & Miles, 1959), locomotor 
exploration of novel stimuli (Berlyne, 1955; 
Fowler, 1965, 1967; Myers & Miller, 1954; 
Schneider & Gross, 1965), unrewarded ma- 
nipulation (Forgays & Levin, 1961; Premack 
& Bahwell, 1959), and light-contingent re- 
sponding (Forgays & Levin, 1961; Fox, 1962; 
Premack & Collier, 1962; Stewart, 1960, p. 
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188). Decrements in responding during ses- 
sions that follow deprivation have been re- 
ported for each of the above stimuli: visual 
exploration (Butler & Harlow, 1954; Rabe- 
deau & Miles, 1959), locomotor exploration 
(Adlerstein & Fehrer, 1955; Berlyne, 1955; 
Glanzer, 1961; Montgomery, 1951, 1952b, 
1955; Montgomery & Monkman, 1955; Mont- 
gomery & Zimbardo, 1957; Welker, 1957; 
Williams & Kuchka, 1957), unrewarded ma- 
nipulation (Forgays & Levin, 1961; Harlow, 
1950; Kling, Horowitz, & Delhagen, 1956; 
McCall, 1965; Premack & Bahwell, 1959; 
Schoenfeld, Antonitis, & Bersh, 1950; Welker, 
1956), light-contingent responding (Forgays 
& Levin, 1961; Fox, 1962; Kling et al., 1956; 
McCall, 1965, 1966; Premack & Collier, 1962; 
Roberts, Marx, & Collier, 1958; Wendt, Linds- 
ley, Adey, & Fox, 1963). The phenomenon of 
“spontaneous alternation” has also often been 
interpreted as supporting the view that there 
are non-need-reducing stimuli that are sub- 
ject to deprivation-satiation effects: given a 
choice between several alternatives, rats have 
been found to alternate consecutive selections 
with far greater frequency than expected by 
chance (Dember & Fowler, 1958). A depriva- 
tion-satiation account of such behavior is that 
on each trial there is an increment in satiation 
to the chosen alternative and a decrement in 
any preexisting satiation to the unchosen 
alternative, both effects reducing the proba- 
bility that the same alternative will be selected 
on the following trial. In accord with the 
deprivation-satiation account, intensive in- 
vestigation of rats’ maze behavior suggests 
that spontaneous alternation is not attribut- 
able simply to an avoidance of repeating motor 
responses or directional responses, but repre- 
Sents, at least in part, an avoidance of re- 
peated contact with the exteroceptive stimula- 
tion provided by the last chosen alternative 
(Dember & Fowler, 1958: Eisenberger, Myers, 
Sanders, & Shanab, 1970). Similarly, forced 
exposure without choice to novel or variable 
Stimulation has often been found to reduce 


Tan, 1952; 
light-continge 
1962). The h 
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ing statistically significant effects predicted Iy 
the deprivation-satiation hypothesis is Imp 
u^ the other hand, some investigations 0; 
visual exploration, locomotor poe E. 
light-contingent performance have e : d 
obtain increased responding with po 
tervals between stimulus exposures hes. 
worth & Thompson, 1957; agr ns 
1958; Haude & Ray, 1967; w ere on 
Kuchka, 1957, Experiment 2) or faile nding 
tain within-session decrements in jm 
following deprivation (Butler & ar Solo- 
1955; Montgomery, 1955; Thompson * 
mon, 1954; Welker, 1957). — handled 
Some of these exceptions might m pility 3 
by assuming that the degree of avatta inad- 
the stimulus being investigated PEY of 
vertently confounded with the availa a own 
competing stimuli. Glanzer (1961) o em- 
how a stimulus-satiation theory pi in ex 
ploy a response-competition iere jn in 
plaining failures to obtain predicted "ocomot 
responsiveness. Glanzer's analysis of fron 
exploration assumes that movement ) oc- 
maze unit (A) to another maze unit pled i” 
curs whenever a novel element 15 wert $ 
B and a novel element is not samples y 
Exposure to a stimulus decreases Mom Jan” 
while nonexposure increases its now AE wher 
zer’s model predicts, for example, [it 
withholding access to a maze makes sovene”! 
parts highly novel, the speed of Minim?! 
within the maze will initially be tin’ 
Contrary to the usually found CO n 
within-session decrement in spee E 
traversal, under such conditions SP sessio 
increase during the first part of dun 
the components of the maze pe 
familiar. This promising model ce 


perimental attention. 


Dual Process Theories 


Certain findings still pose difficult 
ever, for the stimulus-satiatio? moto! 49 
Glanzer's (1961) analysis of vei qures 
ploration cannot explain occasiona x 
find an increase in speed of movem 4 
familiar enclosure to a novel ag 
duration of prior confinement g^ hom? et 
area is increased (Charlesworth & ance 0 
1957). Nor can it explain the avol 
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jes; } 
or 
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observed with very novel or incongruous stim- 

uli (e. Barnett, 1958; Hebb, 1949; Menzel, 

1962; Montgomery, 1955; Welker, 1956). 

inser (1958) suggested that his stimulus- 
satiation theory (Glanzer, 1953) skirted the 

Issue by assuming that the probability of “all 

responses” to a novel stimulus decreases with 

exposure and increases with nonexposure. 
| Thus, avoidance responses should show satia- 
tion and recovery in the same way as ap- 

Proach responses. When the predominant reac- 
~ tion to a novel stimulus is not approach or 
avoidance but a conflicting combination of 
fhe two tendencies, the problem remains of 
Predicting changes in overt behavior as a 
idles of the duration of stimulus presenta- 

9n and withholding. 

Montgomery (1955) proposed a dual pro- 
cess theory similar to that of Glanzer's (1958) 
rcont for occasional failures to find in- 
Ta in exploration using à lengthened n 
i al between sessions or to find decreases 

exploration within a session that follows 
sq ation. According to Montgomery, novel 

muli elicit both an exploratory drive and 
ĉar drive, both motives decreasing during ex- 
Bom to a novel stimulus and increasing with 
ks inre, Withholding a novel stimulus 
held by Montgomery to reduce subse- 
7? exploration whenever the between-ses- 
increase in fear drive was greater than 
diye  Yeen-session increase in exploratory 
utin Further, exploration should. increase 
ins exposure whenever the within-session 
Withee m fear drive is greater than the 
Mo decrease in exploratory drive. 
fame ery (1955) maintained rats in a 

Miliar cage from which they received three 
a 10-minute periods of access to a novel 

aight alley, For half the animals, an ele- 
ned alley was used, while an unelevated 
MP, Was used for the remaining eoe 
Would ney assumed that the two alleys 
tude cens approximately the same n: 
al ey exploratory drive but that the G pim 
A AP ages evoke the greater feat ane 
M ing to Montgomery's analysis, the ele- 

p ees should have shown less ex- 
BIS lon and more approach-avoidance con- 
th than the nonelevated group. As expected, 
tj; elevated group traversed more alley 

Uring each session than the elevated 
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group. The conflict data also turned out as ex- 
pected. The elevated group showed the greater 
numbers of (a) traversals between the half 
of the start cage nearer to the novel alley and 
the half farther from the alley and (b) head 
turns toward and away from the novel alley. 

Montgomery's interpretation of these re- 
sults may be criticized on several grounds. 
Rather than demonstrating differences in con- 
flict, the approach-avoidance measures might 
reflect simply differences in random activity. 
Also, the lesser rate of maze traversal by the 
elevated group than nonelevated group might 
represent a lesser exploratory incentive of the 
elevated maze, rather than the elicitation of 
greater fear drive assumed by Montgomery. 
Moreover, the measure of exploration has 
been criticized for reasons that are subse- 
quently discussed. Of even greater difficulty 
for Montgomery's analysis are the many ways 
in which the presumed drives of exploration 
and fear might change within sessions and be- 
tween sessions to yield the observed changes 
over time in exploration. To test the as- 
sumption by both Glanzer (1958) and Mont- 
gomery (1955) that approach tendencies and 
avoidance tendencies decrease during exposure 
to novel stimulation and increase with sub- 
sequent nonexposure requires some method of 
differentiating an increase in approach tend- 
ency from a decrease in avoidance tendency, 
and vice versa, One possibility is the use of 
drugs found to differentially affect the ap- 
proach and avoidance tendencies. 


Bonrpow DRIVE 
Extending the Myers-Miller Analysis 


According to Montgomery’s theory, initial 
contact with a novel stimulus produces ex- 
ploratory drive (see also Harlow, 1953), 
Myers and Miller (1954) suggested an al- 
ternative drive explanation of the reward 
value of novel stimulation which does not 
violate the assumption of Hull and Miller that 
any rapid increase in drive is punishing. 
Myers and Miller held that a novel stimulus 
does reduce a drive which they termed “bore- 
dom.” This drive was caused, however, not by 
the novel stimulus itself, but by prior ex- 
posure to constant stimulation. What Myers 
had in mind was that extended exposure to 
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constant stimulation produces a boredom drive 
specific to that stimulation (A. K. Meyers, 
personal communication, June 1970). Bore- 
dom drive was reduced by any change in 
stimulation. 

Unlike the case with the stimulus-satiation 
theories, findings that extreme stimulus 
novelty or unexpectedness produce avoidance 
cause no problem for the Myers-Miller posi- 
tion. Just as forced stomach loading by tube 
of a very large quantity of nutriment or 
water would presumably be highly aversive 
due to considerable distension of the stomach 
and other discomforts, so one might assume 
that a forced “overloading” of stimulus variety 
or unexpectedness is drive inducing. 

More recent boredom-drive accounts (Fow- 
ler, 1965; Isaac, 1962; Jones, Wilkinson, 
& Braden, 1961) assume that exposure to a 
homogeneous stimulus not only serves to in- 
crease the reward value of a change in the 
Stimulus but also increases preference for 
greater variability among stimuli which are 
qualitatively different from the exposure 
stimulus, According to the more recent view, 
for example, recent homogeneity of stimula- 
tion in one Sensory modality or stimulus di- 
mension increases the reward value of varia- 
tion in other sensory modalities or stimulus 
dimensions. We know of only two experiments 
that have tested this prediction. 

In an experiment by Isaac (1962 ), monkeys 
Were confined in an opaque box and sub- 


Jected to a 2-hour period of pretest sensory 
deprivation. 


deprivation 


ain pulling 
duration of chain 
by subjects who had experienced pre- 
55 was greater than that 
cing pretest white noise 
s finding is predicted by the 
-drive accounts, the initia] 
noise being 
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animals would undergo the preceding pre- 
test conditions, but using some measure E 
E à -t period. Moté 
random activity during the test perio i 
over, contrary to the boredom-drive we 
the addition of light to a dark, quiet prea 
deprivation period did not reduce subsea i 
responsiveness for white noise. Since et ds 
experiment reported by Isaac indicate wa 
white noise was not rewarding for ca ‘te 
animals, one might argue in defense ME 
boredom-drive position that an enia. 
of white noise outweighed any uae t 
value stemming from its capacity to T 
boredom. gave 
Berlyne, Koenig, and Hirota (1966) Ee 
rats a 15-minute conditioning session 0^ ails 
of 4 days, with bar pressing produrre at 
second noise or a 1-second change in in with 
of illumination, these sessions i eem w 
4 days of extinction sessions. AII € to in 
preceded by 30 minutes of gne noise: 
termittent light change, geet jnter- 
both intermittent light change pt px- 
mittent noise, or no stimulus n 
periencing the test stimulus during e mance 
period did not significantly alter per pi - 
However, the adventitious pene house 
ported that rats which happened to e fof 
in a very noisy room responded av Jatio™ 
pretest stimulation than novel age b 
while the reverse was true for rats in ere 
a relatively quiet room. These diferen rie 
also found for extinction responding: lace 
of the 80 original animals were p con K 
failure to bar press during the ation a 
tioning session, there being no in nce ad y 
whether noisy versus quiet maintena ond- Ae 
differential effect upon failure to us con 
suming no relation between mainten riment, 
dition and elimination from the g esl 
boredom-drive interpretation. of wd 
might proceed as follows: Noisy "i 
produced substantial satiation a ; 
drive, When bar pressing yielded n gu jec^ 
lation for the noisy maintenant? ‘oll 
little responding occurred because satiati? 
z lete > eint 
necessary to produce nea tori pré 
of the boredom drive. When t 
yielded familiar stimulation re resP 
maintenance subjects, somewhat we pearl 
ing occurred as needed to pridie i 
plete satiation. On the other hano 


" 


NON-. 


housed in the quiet room had too high a level 
of boredom drive for either the familiar stim- 
ulation or novel stimulation to produce any- 
Where near complete satiation of the boredom 
drive, These animals bar pressed more for 
s noval stimulation than familiar stimula- 
th €cause of the greater incentive value of 
€ former, An additional implication of this 
Account is that the quiet maintenance sub- 
s quu have shown more overall instru- 
du al responding than the noisy maintenance 
Jects because the noisy maintenance ani- 
mals are presumed to have had their boredom 
bs anten with relatively little instru- 
tion 1 responding. Contrary to this expecta- 
lite lowever, inspection of the data reveals 
ran. overall difference in performance be- 
aiat noisy maintenance subjects and quiet 
efor enance subjects. More research is needed 
lien Intermodality transfer of boredom ef- 
tivel may be concluded to have been defini- 
bs demonstrated. The details of such ef- 
Provide a test not only of boredom-drive 
“ories but also of other theories discussed in 


Sequent sections. 
Hult-Sponce 
Some 


Analyses 


of the more recent boredom-drive in- 


stimulation (Fowler, 1965, 1967; 
» 1961; Jones et al., 1961) employ the 
Spence conceptual framework. Accord- 
et this view all sources of drive, including 
fects 9m, should have general energizing ef- 
"eleva hese theorists contend that such “ir- 
shoo, t Sources of drive as hunger, thirst, 


abi," CT loud noise should increase the prob- 


ull. 
Ing 


ed onperience were unfamiliar, reduce both 
a drive and the probability of ex- 
Shy responses, 
Ste enslisly contradicting both viewpoints 
Stanja 8S that initial experiences of sub- 
o ing er, thirst, and shock e m 
T ex m and sometimes to decrease pag 
b, er d oration (Fowler, 1965, pp. 50-54). 
" ue 1965) has responded to such evidence 
“Honing the validity of one common 


St 


EED-REDUCING REWARDS 


323 


measure of maze exploration, the number of 
maze sections traversed per unit time. Ac- 
cording to Fowler and others (e.g., Glanzer, 
1961; Zimbardo & Miller, 1958), exploratory 
behavior in the maze might be reflected in two 
competing exploratory tendencies: to investi- 
gate more maze sections per unit time and 
to examine individual maze sections in more 
detail. Fowler suggested that a better measure 
of exploration is the animals readiness to 
move from a thoroughly familiar surround to 
a novel stimulus. Moreover, following Brown 
(1961), Fowler also emphasized that Hull- 
Spence theory predicts hunger or thirst to in- 
crease the probability of exploration and simi- 
lar responses only when the responses are 
dominant in the animal's response hierarchy. 
According to Fowler (1965, p. 52), we may 
insure that responsiveness for stimulus change 
is the dominant response in hungry or thirsty 
animals by eliminating stimuli relevant to 
hunger or thirst and by making the stimulus 
change sufficiently great and/or giving suí- 
ficiently great prior exposure to homogeneous 
stimulation. 

As supporting evidence for the general-drive 
hypothesis, Fowler noted findings that hunger 
increases the animal's speed of movement from 
a familiar surround to novel surround and also 
that hunger and thirst increase the rate of 
light-contingent responding. However, while 
an evaluation of such evidence is important 
for our subsequent discussion of optimal-ac- 
tivation theories, its value in testing the gen- 
eral-drive hypothesis is limited. According to 
the Fowler-Jones account, heightened irrele- 
vant drive should increase the probability of 
the dominant response. But we do not know 
a priori the degrees of preexposure to homo- 
geneous stimulation or of test-stimulus varia- 
tion that are necessary to make dominant the 
response for stimulus variation. 

One solution would be to employ within a 
single experiment not only several degrees of 
hunger or some other irrelevant drive but also 
a broad range of values of the homogeneity of 
pretest stimulation or of variation in the con- 
tingent stimulus. According to the Fowler- 
Tones theory, increasing the level of irrelevant 
drive should improve asymptotic performance 
of a dominant response for stimulus variation, 
such dominance resulting when the pretest 
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stimulation is quite homogeneous or when the 
contingent stimulation is quite variable. In- 
creasing the level of irrelevant drive should 
worsen asymptotic performance of a non- 
dominant response for stimulus variation, non- 
dominance resulting when the pretest stimula- 
tion is quite heterogeneous or when the con- 
tingent stimulation is not very variable. 
Another satisfactory procedure is to give 
animals a single choice between a pair of 
stimuli differing in their Sensory variation. If 
we may assume that choosing with greater 
probability one alternative of the pair repre- 
sents the dominant response, then the Fowler- 
Jones account generally predicts that increas- 
ing the level of irrelevant drive will increase 
preference for the alternative preferred under 
low irrelevant. drive, Fortunately, there is 
some aiiis on how irrelevant drives alter 
choice between novel and familiar stimulation. 
Only studies in Which data are reported for 
the initial choice between the familiar stimulus 
and novel stimulus are considered here. When 
free-choice data are averaged over trials 
(Richards & Leslie, 1962), the analysis in 
terms of Hull-Spence theory becomes much 
more complex. While data are not available on 
the effect of irrelevant appetitive drives, sev- 
eral studies have investigated the effects of 
pretest aversive stimulation on choice between 
novel and familiar stimuli (Aitken & Sheldon, 
1970; Haywood & Wachs, 1967; Sheldon, 
1968; Thompson & Higgins, 1958). In each 
Study, animals not receiving prechoice aversive 


us. It was reported 
f the novel alterna- 
mulation dropped to 
heldon, 1970: Hay- 
eldon, 1968, Experi- 
(Sheldon, 1968, Ex- 
Higgins, 1958). As 
967) pointed out, 
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ceptive stimulation decrease the probability 
of a dominant exploratory response. ; 

Another important implication o Mo 
Fowler-Jones position, which to this E E 
knowledge has not received empirical oat 
that boredom itself should act as an bask a 
drive to energize such behaviors as eating 
drinking. 


OPTIMAL ACTIVATION 
The Fiske-Maddi Theory 


à > Hebb 
Extending earlier conceptions - add 
(1955) and Leuba (1955), Fiske an 


^ an 
(1961) proposed that animals sive tl 
intermediate level of “overall stimwé p 
Fiske and Maddi used the term Kr - 
refer to exteroceptive, interoceptive, arabe 
bral influences on reticular activation. of the 
was said to be a monotonic peer 0 
intensity, meaningfulness, and varia ly with 
stimulation. Activation varied not ks ut als? 
short-term environmental conditions wake 
with the organism’s characteristic amm habitur 
fulness cycle. This cycle resulted from erhap? 
tion to environmental stimulation and P hn. 


i 
[ 


: ical r’: + 
from some fundamental physiological ! 7 4 jt , 


hic 
In the absence of a specific goal for i » ex 
was striving, the organism attempt in ain 
ploration and related behaviors i sleep“ 
activation at the level normal for Hh at 
wakefulness cycle, With a specific nist 
hand (e.g., working for food), pet 8 
tended to modify its ove ctive P^ 
level appropriate for maximally ef pe and ? 
formance of the instrumental respon 
the goal behavior itself. . & Ma 
The Fiske-Maddi theory (Fiske 


ddi, 
et 


; tima el 
1961, p. 272) predicts that an oP ut" 
pn ^9 


b 
wt. ined in the 2 
of activation may be sustained in the 


rjatlO 
of normal exteroceptive stimulus pp s 
a sufficient level of interoceptive O! Geh in 
tive stimulus intensity, Thus, hung E im? 
shock, or noise should decrease 
performance for variation of other ccoun 1 
In accord with the Fiske-Maddi a6 tha sho 
viously discussed evidence indicie ce m 
and white noise increase rats’ mat i 
familiar stimulation over novel S ugg 
However, there is some evidence ave 


irst increase OT atio™ > 
that hunger and thirst incre imu 1 


ve] st! 
effect upon approach to nove 


“oc | 


timu se 


| Bu Studies report that hunger increases 
S Ee of movement from a familiar sur- 
T MN c. novel surround (Bolles & De- 
ler FER 962; Fehrer, 1956; Zimbardo & Mil- 
NM 58). While it might be argued that the 
Find speed simply reflects an increase in 
B c activity, rather than an increased 
Bird value of the novel stimulation, Zim- 
5 and Miller (1958) stated that the 
E Sroup "was facing the door [to the 
stimulation] and was in the sections of 
E. n ans nearer to the door when it 
of fae ped [p. 45]" on a higher proportion 
iin than was the nonhungry group. Fur- 
re, Richards and Leslie (1962) reported 
^ both hunger and thirst increased the per- 
age of choice of novel alternative. Hughes 
tion į gave rats 15 minutes of free explora- 
: beatin, a divided six-unit box, half the units 
Tha 8 à novel color and half a familiar color. 
1 


Ce 


ne 
Unit hungry rats traversed more total maze 
iffa than nonhungry rats, and the two groups 
Ted neither in the total time spent in the 


Noy, : 
i el half of the maze nor in the ratio of 
Ove] y 


nits ; amilia ie tis 
versed traversed to familiar units tra 


: Other studies have examined the ef- 
iw of hunger or thirst on light-contingent 
| termine sing by rats. Since we wish to de- 
tiop UO the effects of intense internal stimula- 
ar PS the reward value of light-contingent 
ti , essing qua novel or variable stimula- 
ing ? t Would not do to have strongly compet- 
the nes of stimulus variation present in 
log St situation (Leaton, Symmes, & Barry, 
adaptat; hus, only studies employing pretest 
dition to the test situation are examined. 
; Often nally, since different pretest conditions 
l Onti Produce rather small differences in light- 
| bos, ent free-operant performance, it is im- 
or gat to assess the possible effect of hunger 
free e to alter the unreinforced level of 
ate Perant performance. A high unreinforced 
l been” Instrumental responding has sometimes 
(Scha, Ound to have incremental effects 
feg ter, 1965) and sometimes decremental 
1965 (Eisenberger, Karpman, & Trattner, 
l by E an the reinforced response rate. Studies 
le s (1959) and Kiernan (1965) meet 
M A ep requirements since (4) pretest 
ah the q Was employed in the test apparatus 
aq (b ar nonfunctional (no light reward) ; 
Y the last pretest adaptation ses- 
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sion, the rate of nonfunctional bar pressing 
differed little as a function of degree of hunger 
or thirst. Both studies reported a greater rate 
of light-contingent bar pressing with increased 
hunger or thirst. Unfortunately, even these 
results are not definitive. Campbell and Shef- 
field (1953) reported that stimulus change 
produces greater random activity in hungry 
animals than nonhungry animals. Although in 
the preceding studies the pretest levels of non- 
functional bar pressing were equated across 
hunger or thirst conditions, it is still possible 
that illumination change simply evoked more 
behavior among the hungrier or thirstier ani- 
mals than satiated animals in the area ad- 
jacent to the light-contingent bar, and by this 
means produced a greater rate of bar press- 
ing (Kiernan, 1964). One solution to the be- 
havior-evocation problem would be the use of 
extinction sessions interspersed with, or fol- 
lowing, the light-reinforcement sessions (Ber- 
lyne & Koenig, 1965; Berlyne, Salapatek, 
Gelman, & Zener, 1964). If no differences oc- 
curred among groups in the unreinforced rate 
of instrumental responding, differences in per- 
formance during extinction would hopefully be 
attributable to reinforcement effects of il- 
lumination change and not simply to behavior 
evoked by illumination change. Another al- 
ternative is the employment of a discrete-trial 
choice situation with one alternative produc- 
ing a change in illumination. In sum, there is 
insufficient evidence to judge whether, and in 
what direction, hunger and thirst alter the 
reward value of illumination change. While, in 
accord with the Fiske-Maddi theory, there is 
some evidence suggesting that shock and 
white noise reduce preference for a novel al- 
ternative, hunger and thirst have been found 
to increase or have no effect on approach to 
novel stimulation. 

Fowler (1965) has questioned whether the 
Fiske-Maddi theory can account for findings 
that very hungry rats choose a new goal arm 
on a second free-choice trial in a T maze even 
when the first choice is rewarded with food 
(e.g, Fowler, Blond, & Dember, 1959) or 
escape from shock (Fowler, Fowler, & Dember, 
1959). Such findings contradict Fiske and 
Maddi's assumption that performance which 
satisfies any strong drive or motive (eg. 
choice by hungry rats of a goal arm which 
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provides food) dominates over exploratory 
tendencies (Fiske & Maddi, 1961, p. 35). A 
possible defense of the Fiske-Maddi position 
would be that choice alternation in such a 
situation does not actually represent an ex- 
ploratory motive: when the first T maze goal- 
arm entry is rewarded with food, the argu- 
ment would run, some sort of temporary in- 
hibition due to the food reward simply re- 
duces the likelihood of immediate selection of 
the same alternative, Tt follows that a very 
hungry food-reinforced rat should repeat its 
initial response at greater-than-chance level if 
the intertrial interval is made sufficiently long 
(cf. Walker, 1958). There is, however, evi- 
dence against this prediction (Denny, 1957; 
Denny & Leckart, 1965; Zeaman & House, 
1951). 


Berlyne’s Theory 


While Hebb and Fiske and Maddi regarded 
changes in activation toward an intermediate 
level as being rewarding, Berlyne (1960, 
1963) initially retained a modification of the 
more traditional drive-reduction model. Like 
Hebb, Berlyne viewed reticular activation as 
an energizing general-drive state. But only 
those stimuli that produce a reduction in ac- 
tivation were rewarding. Activation was pos- 
tulated to be a U-shaped function of the 
novelty, surprisingness, and complexity of sen- 
sory input, High values of sensory input were 
said to increase activation by increasing the 
activity of Sensory collaterals which synapse 
in the reticular formation. Low sensory input 
led to increased activation by reducing cortical 
inhibitory restraint on reticular activity. Addi- 
tionally, very great stimulus intensity 
tion was assumed to reduce activation, 


temporary increase in 
€quent pleasure of ac. 
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Recently, Berlyne (1967, 1969) has E. 
the position that increases in a! D 
be rewarding even when not followe ks 
subsequent drop in activation. Small ey 4 
erate increases in activation are held is al- 
rewarding, except when N highly 
ready very highly activated. For a ven he 
activated organism, the greater p^ renle 
increment produced by a stimulus, we p 
the stimulus’ aversiveness. Berlyne e any 
12) no longer identifies activation e of 
particular central nervous system rdinatelY 
process. He still maintains that an ino variety 
low amount of stimulus intensity x that 
will produce a high level of opi edi 
decrements in activation from a bier erly 
a moderate level will be rewarding ty state, 
1967, p. 30). Berlyne does not year 
but seems to imply, that he has M od from 
the view that decreases in activati reward” 
moderate levels to low levels will De jike the 
ing (Berlyne, 1967, p. 30). Someta (Ber- 
arousal jag still appears to be wit ie E) 
lyne, 1967, p. 94). Berlyne nep genel 
sume that activation is associated WIU! 7 and 
energizing effects (Berlyne, 1966, rop whe? 
that the level of activation will omes vety 
stimulus intensity or variation be 
high ( Berlyne, 1969, p. 206). T 

Unlike the Fiske-Maddi theory i? Y simp 
reward value of stimulus variation ity " 

8 e e intens!*5 er^ 
a decreasing function of the E lation; B 
teroceptive or exteroceptive iren à d 
lyne (1969) has recently offered oue & noy 
tive models, both of which asst orga an 
monotonic relation between s value, 
activation level and the rewarc i 
stimulus variation. In one mode stimul" c 
tion increment caused by à given nism a 
an increasing function of the ae rewa! am 
tivation level, and the mins. no Mr 
amount of activation increment level. In rof 
with the organism’s activation remen! "gn 
second model the activation m no cha nl 
duced by a given stimulus pred nd a 
with the organism's activation : í 
maximally rewarding amoni ape cac 
increment is an inverted. ipw TOS fo ` 
of the organism's activation redicti0P on! 
model, Berlyne (1969) offers m activa’ put ` 
each of three ranges of organi Im j 
Subnormal, normal, and SUP i 


| NON-NEED-REDUCING REWARDS 3 


Since he fails to provide an independent 
‘Method for empirically identifying these 
Tanges and since the assumed relation between 
®rganismic activation and reward value is 
honmonotonic, an adequate test of the models 
requires an ambitious experiment. The reward 
| Value of a moderately variable stimulus needs 
to be tested using preadaptation conditions 
Whose effects are presumed to range in small 
€grees from very low activation to very 

, Steat activation, 
a has postulated a number of pro- 
among i ae activation and me relations 
lvation, reinforcement, and per- 
| 
) 


or : sx: SE 
mance, Some details of the individual pro- 
Cesses 


het and of the manner in which they inter- 


have been left unspecified, with resultant 
eee for example: (a) According to 
ts models, the activation increment 
| ther bya moderately variable stimulus is 
Chan 2 NM (Model 1) or remains un- 
iren , (Model 2) as a function of the 
tom EN S activation level. On the other hand, 
fine erlyne’s assumption that low sensory 
| iN Produces high activation, it follows that 
ly ‘gies of a moderately variable stimu- 
duce a sensorily deprived animal should re- 
fbig the activation level. Thus, there is am- 


ly over the direction of change of ac- 


Wa g 
(b tion Produced by variable stimulation. 
Ber very high levels of stimulus input, 


Ne assumes that activation drops. It is 
tey ie clear how the drop should affect the 
ariet value of stimuli of various degrees of 

or intensity. (c) Both of Berlyne's 
tti Predict that animals which are highly 
shock v- due to intense hunger or recent 
Wi, y Vill avoid additionally activating stim- 
iva; > the assumption by Berlyne that ac- 
edic IS a general energizer yields the 
ires that hunger or recent shock will 
Men © any dominant approach tendency for 
Ince $ 9r variable stimulation. Moreover, 
"hen | € activation level is assumed to drop 
ery p Mulus intensity or variation becomes 


tion 
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(TON » the hungry or recently shocked 
ease in might withstand a temporary in- 
"ty... activation for the ensuing relief of 
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F'ocesses? 


pe 
E 


ADAPTATION LEVEL 


Adaptation Levels for Individual Stimulus 
Dimensions: The McClelland-Clark Theory 


McClelland and Clark (1953) proposed a 
motivation theory which drew heavily on Hel- 
son's adaptation level concept and Hebb's 
(1949) learning theory. Hebb (1949) sug- 
gested that small discrepancies between ex- 
pectation and experience are pleasurable, while 
large discrepancies are unpleasant. Helson 
(1947, 1959, 1964) postulated a judgmental 
frame of reference; a null point above which 
a stimulus is reacted to as high on a particular 
dimension and below which is reacted to as 
low on that dimension. Helson postulated that 
the adaptation level for any stimulus dimen- 
sion is a weighted average of the relevant 
focal stimuli, background stimuli, and the 
residual stimuli from previous experience and 
from constitutional and organic factors. Mc- 
Clelland and Clark (1953) assume that stim- 
ulation having the same value as the adapta- 
tion level is affectively neutral, the magnitude 
of affect associated with a stimulus being a 
curvilinear function of the magnitude of dis- 
crepancy between it and the adaptation level, 
Small discrepancies produce positive affect, 
large discrepancies negative affect. More spe- 
cifically, as the discrepancy is increased from 
nil, affective value first becomes increasingly 
positive, then decreasingly positive, and fi- 
nally increasingly negative. 

Supporting evidence for the McClelland- 
Clark theory comes from a study by Haber 
(1958), using humans, which reported water- 
temperature preference to first increase, then 
decrease as a function of the water's dis- 
crepancy from the normal (adaptation level) 
skin temperature. The predicted relation be- 
tween deviation from adaptation level and 
preference was also found when subjects were 
adapted to water of a temperature 1 or 2 
degrees greater than the normal skin tempera- 
ture. The effect was not obtained, however, 
using even higher adapting temperatures, and 
Haber suggested that with these higher tem- 
peratures the adaptation period may simply 
have been too short for full adaptation to 
occur. i 

Additional relevant evidence comes from 
light-contingent bar press studies with rats, 


328 


which employed preadaptation to a given light 
intensity and tested rate of instrumental re- 
sponding as a function of the magnitude of 
contingent change in illumination from the 
preadaptation intensity to consequent inten- 
sity. McCall (1965) factorally combined four 
initial light intensities (.26, .85, 2.75, 8.73 
millilamberts) with four consequent inten- 
sities having the same values, Each rat was 
given a daily operant pretest for 5 consecu- 
tive days, with the bar nonfunctional, fol- 
lowed by 5 days of light-contingent bar press- 
ing and 2 days of extinction. Each session 
consisted of 10-minute preadaptation to the 
initial intensity, followed by 15-minute access 
to the bar. The rate of bar pressing was found 
to increase, the greater the increment or decre- 
ment of intensity from the initial level. This 
finding is consonant with the McClelland- 
Clark theory if it is assumed that the con- 
Sequent intensities were rather similar per- 
ceptually to the initial adapting intensities, 
The effect did not obtain, however, for ex- 
tinction responding, leading Berlyne (1969) 
to indicate that the differences in bar pressing 
during acquisition may have been due to 
some behavior-evoking effect of light change, 
rather than a reinforcing effect of light change. 
In a subsequent experiment by McCall 
(1966), rats were given three operant pretests 
with the bar nonfunctional, followed by nine 
contingency sessions. Each session consisted 
of 6-minute preadaptation to an initial in- 
tensity, followed by 6-minute access to the 
bar. Using intensities of .10, .18, 31, .54, .94, 
1.64, 2.87, 5.03, and 8.80 millilamberts, 41 
initial-consequent intensity combinations were 
selected for use. For the first 3 days of test- 
ing, the rate of bar pressing increased, the 
greater the increment or decrement in in- 
tensity from the initial intensity. However, 
contrary to what the McClelland-Clark theory 
would predict, preferences for dim light came 
to dominate by about the fourth day, and 
light increment ceased to be reinforcing. Re- 
Sponding thereafter was an increasing function 
of the magnitude of the illumination decre- 
ment. Unfortunately, controls were not em- 
ployed to separate the reinforcing effects of 
light change from its possible behavior-evoca- 
tion effects. McCall’s (1965) failure to find a 
relation between the rate of bar pressing in 
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extinction and the magnitude of light es | 
in acquisition contradicts the McClellan® | 


Clark theory. :mulation 
Moreover, work on gustatory stimu Y 


indicates that contrary to the McClellan j 
Clark theory, water will be preferred to aa | 
quinine solution whose concentration Cs). 
or exceeds threshold (Young, 1966, P- ape 
The theory would also appear not easily C E. 
able of explaining the preference chun MA 
duced by certain combinations of h T 
Rats prefer a sucrose solution to gaty E 
prefer water to a quinine solution. Y et, ith à 
concentration of quinine in solution w 
high concentration of sucrose will i 
to a sucrose solution of equal concen 
(Young, 1966, p. 68). . dimen 
Finally, there are certain stimulus accus 
sions for which large deviations sum m e, 
tomed stimulation are rewarding. For 1 rewat 
greatly increasing the amount of gons menta 
which hungry rats receive for an inito bs 
response produces in many situations c col- 
and marked improvement in performan v ion 
lowing McClelland and Clark's inter t 
of the judged pleasantness of high S ain that 
trated sucrose solutions, one might diiferenc 
there are too few just noticeable d reward 
along the dimension of amount of ot to 
for even large deviations from expec trary tol 
produce negative affect, However, pe i 
this argument, a shift of equal T7 
from accustomed large food rewa! api and 
food reward usually produces à a might | 
marked decrement in performance. on Jars? | 
of course, assume that a change t eror? | 
reward to small reward produces the e T 
ance decrement because it is more Lan 
a5 measured in just noticeable differe Jana 
than the reverse change. This eP 
seems unlikely but merits testing 
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Helson (1947, 1959, 1964) raised evel £03 | 
sibility of not only an adaptation "T 
individual stimulus dimensions veral Sl 
general adaptation level to which ipeo jo? 
ulus dimensions contribute. ^ adapt” gt) 
Glanzer (1958), while not stated ximaté* iat 
level terminology, closely appro ry 4 
form such an adaptation level i 
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P. according to Glanzer, the magnitude of 
Eum or avoidance elicited by stimulus 
ormatior "s related to the amount of “in- 
M on" the stimulation imparts. Informa- 
tion po derived from communica- 
hann ae the unpredictability of events 
tate of sak & Weaver, 1949). The preferred 
0 be the ormation input was held by Glanzer 
€ average of those values experienced 
ana nimal throughout its life. According to 
Which ih analysis, the greater the extent to 
lon fall € current rate of processed informa- 
Ore the below the accustomed amount, the 
tate th E organism will act to increase the 
Vatiation ee increased exposure to stimulus 
Current 7 The greater the extent to which the 
c Tate falls above the accustomed level, 
Posure * the organism will decrease its ex- 
tate of o variable stimulation. The average 
viewed stimulus variation might simply be 
perienc an adaptation level, to which prior 
nce contributes greatly. Glanzer's the- 
the tenon by assuming that changes in 
Stimy ^ i to approach or avoid variable 
lon are related to changes in the re- 
Value or punishment value of the stimu- 
^ While the following attempt to apply 
de b S theory to current experimental evi- 
lustrat 'S perhaps rather tedious, it well il- 
ie many of the conceptual ambiguities, 
ĉo, ological shortcomings, and empirical 
trips ies so typical of current work on 
ise, eXDloratory behavior (see also the 
ton section of this review). 
NM Predictions of Glanzer's theory are 
Dro led considerably when the variation- 
has en response used for the test situation 
depe, 'oderate response strength of its own, 
lî ei "dent of the consequences of the con- 
4 ba Either an improvement or worsening 
Ma 9rmance as a result of the contingency 
io Sa en be detected, The preceding condi- 
y tisfied, several assumptions are required 
ity ie to make predictions for specific test 
"hen US: First, there is the question of 
vi Savi contingent stimulus provides 
ied y tion or less variation than that pro- 
Wi test the other (background) stimuli of 
lan, ^. situation, Due to ambiguity in 
hs oe definition of variation, the assump- 
enca S! increase or of a decrease in ex- 
Variation are sometimes equally 
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plausible. For example, studies by Hunt and 
Quay (1961) and Meier, Foshee, Wittrig, 
Peeler, and Huff (1960) employed two levels 
of stimulus heterogeneity during rearing, being 
followed when the animals were older by a 
test situation in which responding served to 
change stimulation from one of the levels of 
heterogeneity to the other. In the test situa- 
tion, it is unclear for the subjects reared with 
the stimulus of greater heterogeneity whether 
the change from experiencing the heteroge- 
neous but thoroughly familiar stimulus to 
experiencing the homogeneous but unfamiliar 
stimulus should be assumed to decrease or to 
increase the rate of experienced variation. In 
the subsequently discussed test situations the 
contingent stimuli appear to have been more 
novel than the rest of the test situation, so 
that instrumental responding presumably 
served to increase the rate of experienced 
variation. A second assumption must be made 
concerning possible differences across groups 
in the amount of experienced variation re- 
sulting from instrumental responses. Predic- 
tion is straightforward only when the amount 
of experienced variation resulting from an in- 
strumental response differs little as a function 
of pretest differences in variation. Such an 
assumption seems most reasonable when the 
contingent test stimulation is associated with 
a different sensory modality or stimulus di- 
mension than the differential pretest stimula- 
tion. In the studies to be discussed, the con- 
tingent test stimulation resembled the differ- 
ential pretest stimulation to a rather limited 
degree, so there is at least some justification 
for assuming that the change in stimulus vari- 
ation resulting from instrumental responses 
did not differ across groups. In the planning 
of future studies which test the effects of pre- 
test differences of rate of experienced varia- 
tion upon the preferred rate of variation, 
greater attention would seem strongly ad- 
visable to the problem of making the con- 
tingent test stimulation substantially different 
from the differential pretest stimulation. 
According to Glanzer’s theory, short-term 
exposure to homogeneous stimulation should 
slightly worsen the animal’s subsequent per- 
formance for stimulus variation. On the other 
hand, the boredom-drive theories predict that 
short-term exposure to homogeneous stimula- 


330 


tion will improve subsequent performance for 
stimulus variation. For the preceding predic- 
tions to hold, one must assume that the loss 
in the experienced rate of stimulus variation 
which results from a period of stimulus ho- 
mogeneity exceeds any increase in experienced 
rate of variation resulting from the possible 
novelty of the homogeneous situation. Data 
on the predictions are available for several 
behaviors (light-contingent performance, loco- 
motor exploration, and visual exploration) and 
species (rats, monkeys, and humans). Severe 
Space restriction prevents documentation here 
of the conclusion that contrary to Glanzer's 
theory, short-term stimulus homogeneity is 
often found to improve performance for stim- 
ulus variation and that failures to find im- 
proved performance also occur with sufficient 
frequency to question the generality of the 
boredom-drive position. The inconsistency of 
results does not seem to be attributable simply 
to weakness of the effect since improvement in 
performance following a period of stimulus 
homogeneity is sometimes of considerable 
magnitude, The variables responsible for the 
inconsistency have not been identified. 
Another prediction of Glanzer's theory is 
that long-term experience of a given rate of 
Stimulus variation will substantially shift pref- 
erence for variation in the direction of the 
experienced rate. While in accord with 
Glanzer’s theory, greater stimulus heteroge- 
neity during long-term rearing has often been 
found to enhance subsequent performance for 
stimulus variation, worsened performance has 
also often been found. Again, severe space 


restriction prevents documentation of this con- 
clusion. 


An additional inade 
ory and some other t 
exploratory behavior 
Thornton, 1967: Jones et al, 1961: Mun. 
singer & Kessen, 1964) is the limitation im- 
posed by using the information measure to 


denote stimulus variation, First of all, as Vitz 
(19662) Stated: 


quacy of Glanzer's the- 
heories of curiosity and 
(Jones, Gardner, & 


The information formula . . . accounts for the num- 
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es between them [p. 74]. 


Second, although an animals exploratory be- 
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havior depends on the similarity of the pres 
situation to previous situations, one may ^. 
in general compute an information wala 
the deviation of present og E 
prior stimulation. Such use of the in Pe i 
tion measure is not possible when the d 
abilities used to generate prior events ei 1 
similar to the probabilities used to gem 
present events. 
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UNEXPECTEDNESS 


osed 
Dember and Earl (1957) have proP ible 


1 
that stimulus unexpectedness is n ex 
for the occurrence of the behaviors bens 2 
ploration, manipulation, and -—— à 
cording to this account, attention 15 falls in 
only when a stimulus’ unexpectedneS ^ qs 
a moderate, acceptable range. Such à mount oli 
termed a “pacer,” elicits the modal sgressiv yi 
attention, other stimuli eliciting prog dissimi 
less attention as a function of their refer 
larity from the pacer stimulus. The P as tha! 
level of unexpectedness is not so gren sure 1 
provided by the pacer stimulus. «m une*| 
à pacer stimulus reduces the opere d 
pectedness and, on the relevant nd egret 
mension, increases both the Mr ne 
of unexpectedness and the unexpecter e pac 
essary for a stimulus to fall in ex ost 
range. Thus, after sufficiently long. will " 
to a pacer stimulus, the stim ros d 
longer fall in the pacer range. ^ essary 
the minimal unexpectedness nec can res i 
stimuli to fall in the pacer range an p 
only from “anxiety,” and qe g” 
(1957, p. 95) assume they are C i 
nonanxious animals. . 35 chan y 

To test the prediction that wn wl 
the preferred level of unexpecte xpeci^ pat 
in the direction of increased UST) gave * 
Dember, Earl, and Paradise (19 of fr F 
two or five daily 1-hour session the “i 
ploration in an 8-shaped pisa agtet sp 
pattern of one loop affording £ 
variation than the wall pattern 
loop. For some animals, a po 
zontal pattern of alternating b ei 
white stripes was paired again? grou” A 
a solid black or solid white Da pana 
horizontal stripes presented mor' e great 
tion and were assumed to prov! 


&xpectedness than the solid background. 
animals had the horizontal stripe pat- 
ped against a vertical pattern of al- 
that E black and white stripes. Assuming 
Taze Rem rats demonstrated more horizontal 
lovement than vertical maze movement, 
Boss] eal pattern provided . the greater 
eterog, heterogeneity. Here again, the more 
ap geneous stimulus was assumed to be the 
Xpected. The preferred loop was taken 
Ket, that in which an animal spent the 
. Ens. time. As predicted by the Dember-Earl 
Drefery, P 13 animals that in later sessions 
I red i a loop different from the one pre- 
the first session, 12 shifted to a more 
“rogeneous pattern. This finding has been 
Bs cn? replicated in similar studies using 
Walker ay, 1968; Sackett, 1967; Walker & 
ditionat 1964) and cats (Thomas, 1969). Ad- 
Studies Supporting evidence comes from 
Dort th employing human subjects which re- 
reser With a sufficiently great number of 

| ations of random two-dimensional pat- 
@ €re occurs an increase in verbal prefer- 
°F patterns that are more heterogeneous 
"Singer & Kessen, 1964, Experiment 8; 
1966p). The preceding evidence gen- 
ävors the prediction that if there is a 
d n preference for unexpectedness, it 
* in the direction of increased unex- 
i asset at least insofar as unexpectedness 
Net, "e to increase with stimulus heteroge- 
hee ^ "Alle most of the studies used a prefer- 
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Uration of stimulus contact, similar 
a Were obtained using more adequate 
Rx, “nce procedures (May, 1968; Munsinger 

Ont 1964), J 
(t95 the other hand, Walker and Walker 

) noted that in their study the overall 
Mithiin Was weak and that an analysis of 
h o. eSslon 6-minute intervals of respond- 
d bri individua] subjects revealed a number 
Movi Preference “regressions” to stimuli 


Wa 8 lesser heterogeneity. Walker and 
R, ser (19 g y d 
Ul thes 264) suggested, “the Dember an 
it ogo appears to hold only when the 
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hat "i. avior employed is sufficiently large 
: uli Seneral trend toward interaction with 
] Ing [ Steater complexity becomes over- 
50 su 495].” Thomas (1969, p. 301) has 
“Sested on the basis of his own data 
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that the theory's predictions are somewhat 
inaccurate over brief periods of time. These 
criticisms incorrectly assume that the oc- 
currence of preference regressions to stimuli 
having a lesser number of perceptible elements 
necessarily contradicts the Dember-Earl the- 
ory. Dember and Earl (1957) never assumed 
that increased within-stimulus homogeneity is 
always associated with increased expectedness. 
For example, a sufficiently long duration of 
exposure to a high level of heterogeneity on a 
given stimulus dimension should decrease the 
unexpectedness of heterogeneity and increase 
the unexpectedness of homogeneity on that 
dimension. It should thus sometimes be pos- 
sible to render an initially less preferred, rela- 
tively homogeneous stimulus more unexpected 
than, and preferred to, a relatively heteroge- 
neous exposure stimulus. An adequate test of 
this hypothesis requires measuring the rela- 
tive preferences for two stimuli, both before 
and following exposure to the more heteroge- 
neous stimulus of the pair. We know of no 
relevant experimental evidence on this ques- 
tion. The difficulty with the Dember-Earl the- 
ory is not that its predictions are incom- 
patible with preference regressions to stimuli 
that are more homogeneous, but that like the 
other theories we have discussed, it does not 
specify for a number of situations whether, or 
when, to expect such regressions. 

Other ambiguities in the theory and in- 
adequacies of prediction are as follows: (a) 
A stimulus eliciting the modal amount of 
stimulus contact is assumed by Dember and 
Earl both to have the preferred level of un- 
expectedness and to be the best attention- 
getter. Yet, the preferred level of unexpected- 
ness is assumed to be Jess than the level of 
unexpectedness eliciting the most attention. 
(b) Dember and Earl's (1957, p. 95) assump- 
tion that an animal will not respond to a set 
of stimuli which lacks a pacer stimulus con- 
tradicts their other assumption that the pre- 
ferred level of unexpectedness is less than that 
of the pacer range. (c) The authors emphasize 
that both approach and avoidance may be 
taken as evidence of attention. The assump- 
tion that animals attend most to stimuli hav- 
ing some intermediate level of unexpectedness 
seems contradicted by previously mentioned 
findings that highly incongruous or highly 
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novel events often evoke intense emotionality 
and avoidance. (d) The preferred level of un- 


expectedness is assumed to be lessened only 
by anxiety, and Dember and Earl limit their 
analysis to nonanxious animals. But the pre- 
sentation of unexpected stimulation itself 
often produces signs of anxiety. Adequate test 
of the theory’s predictions requires inde- 
pendent assessment of between-session and 
within-session changes in anxiety, a procedure 
not included in studies purporting to test the 
theory. Previously discussed evidence that 
aversive pretest stimulation reduces choice of 
a novel alternative supports Dember and 
Earl’s assumption that anxiety reduces pre- 
ferred unexpectedness. (e) If one assumes, as 
do Dember and Earl (1957, p. 94), that pre- 
sentation of homogeneous stimulation usually 
does not alter the preferred level of unex- 
pectedness since homogeneous stimulation gen- 
erally falls below the pacer range, the theory 
is contradicted by evidence that short-term 
exposure to homogeneous stimulation often 
increases the preference for heterogeneous 
stimulation. 


Drive-INDUCTION THEORY 


Sheffield has rejected the idea that drive 
reduction is the source of primary reward. 
According to the most recent statement of 
Sheffield’s theory (Sheffield, 1965, 1966), a 
state of appetitive drive sensitizes the or- 
ganism’s “activation mechanism” toward en- 
ergizing the appropriate innate consummatory 
response. If the innate consummatory stimu- 
lus is made available, consummatory excite- 
ment will be channeled exclusively into the 


RANEY NEMS. GOON Uist precede 
presentation of the consummatory stimulus 
become classically conditioned to the consum- 
matory response. Occurrence of these condi- 
tioned stimuli under circumstances in which 
the consummatory response cannot occur (ab- 
sence of the consummatory stimulus) causes 
the activation mechanism to shunt consum- 
UAM excitement into any ongoing behavior. 
n instrumental learning situations, response- 
pe stimuli of the instrumental response, 
eared fg eae become stably con- 
sh bcd nsummatory response. Thus, 
coegi ! response is the most ener- 

and due simply to its greater frequency 
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of practice, the instrumental response becomes | 
the dominant habit. In such situations, opera- 
tions which increase the energization of the 
consummatory response (eg. deprivation) 
also increase the prepotency of the instru- 
mental response and thereby yield a stronger 
habit. Sheffield has termed his approach drive- 
induction theory since the evocation of con- | 
summatory excitement by conditioned stimuli, 
rather than a reduction of excitement as 
postulated by drive-reduction theory, is held 


to underlie instrumental learning. f; 
Evidence against the view that need redut- Í 

tion or drive reduction is necessary for Pd | 

mary reinforcement was offered by Sheffiel 


and Roby (1950) who demonstrated à ma 
nutritive, sweet liquid, saccharin, to be re 


The relatively long- 
rform- 
ance continued to improve over 14 days ° 
ondary à 
reward value of saccharin, The finding V 
taken as evidence that there does not ha 
to be need reduction or drive reduc F 
drinking to serve as a primary reward. 
ther, Sheffield, Wulff, and Backer 
showed that copulation without ps 
ejaculation was reinforcing for sexually P 
male rats. Since the males were sexually 

the reward value of incomplete copula 
could presumably not be attributed t°, 
ondary reinforcement properties resulting 
previous ejaculations, and it was a 
that sexual activity may serve as a ie 
reward even without sex-driven reducti : 


The preceding classic experimental 


i 

jy in 
sitations by Sheffield provided early ^ peel 
tion that a stimulus does not have ik pow 
the | 
orc | 
ed 7| 
all 
y 

e 


de 


i 
reducing to serve as a primary rewart™ 
ever, Sheffield’s findings are not, ™, 
selves, damaging to drive-reduction 
ment theory if drive reduction is defin 
dependently of tissue-need reduction, 
been done by Neal Miller (Miller, "s 
Nevertheless, Sheffield's drive-inducti? je gl 
forcement mechanism provides 4 yu e 
ternative to drive-reduction theory: - 
drives postulated to explain non-net let 
ing rewards (e.g., boredom, Myers uct! 
1954) may be included in the drive” 
analysis. 
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po cording to Sheffield's theory, the magni- 
€ of instrumental performance should be 


| àn in si ; " 
creasing function of consummatory ex- 


E ‘Sheffield initially believed that the 
the Eu consummatory excitement was 
Pres gor” of consummatory activity in the 
ence of the innate consummatory stimulus. 
Bus et al. (1951) examined instrumental 
M speed asa function of the number of 
ejaculation intromissions, without allowed 
male m performed in the goal box by the 
Tunway s. In support of Sheffield’s position, 
ound d and amount of copulation were 
an Ge Ositively correlated. Sheffield, Roby, 
Way tr mpbell (1954) examined speed of run- 
Frey by rats for a nutritive solution 
tin), ial or nonnutritive solution (saccha- 
n ow also, performance was found to be 
fuc ug function of the rate of consum- 
ing a tee the two solutions produc- 
drinkin ng e function relating running speed to 
fielq 8 rate, More recently, however, Shef- 
the 1966) has suggested that con- 
tenga] OY activation, at the level of the 
ysi nervous system, may rise about the 
logical limit of overt consummatory 
fei Cine: In such cases, according to Shef- 
966), the vigor of the instrumental 
tray might provide a better index of cen- 
Vigo, Summatory activation than would the 

Sheff; 2 consummatory response. — 
Mi er eld’s theory has been criticized by 
ndin S18) as having difficulty in handli g 
Ve 5 that stimuli which reduce appetitive 
Eon um ay be rewarding with the peripheral 
às in Matory response mechanism bypassed, 
pt foo * Contingent stomach loading by tube 
Shegg (Miller & Kessen, 1952). Extending 
tig S recent emphasis on central activa- 
Bons e might assume that consummatory 
po^ bie is channeled mainly into internal 
the 3 E of the consummatory pet 
Ro , only i the stomach loading by tube 0 
e tima internal components of u 
sp ding ti ae stimulation are present. 5 
oe us account, stimuli preceding E 
to tined ing of food become classically 
of SUmma, to the internal components of the 
iil © Shite Y response to food. Presentation 
ko cha itioned stimuli energizes any on- 
"Donse si vior, but favors the instrumental 
“nee instrumental-response-produced 
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stimuli become stably conditioned internal 
components of the consummatory response. 

Miller (1965) has also questioned how 
Sheffield’s theory can account for learned in- 
creases in the vigor of the consummatory re- 
sponse itselí. For example, the vigor with 
which a new food is eaten may increase grad- 
ually with experience. For Sheffield’s theory to 
deal with learned changes in consummatory 
performance, assumptions are needed about 
which factors, besides deprivation, control 
the amount of excitement channeled into the 
consummatory response. 

According to Sheffield’s theory, variables 
which heighten drive should increase the ex- 
citement channeled into the consummatory 
response and should improve instrumental 
learning. Sheffield (1965, 1966) recognized 
that his index of consummatory excitement, 
the vigor of the consummatory response, will 
be valid only up to the level of rive which 
causes the organism to perform the consumma- 
tory response with the maximum vigor physio- 
logically possible. Tt has additionally been 
found, however, that certain variables pre- 
sumed to alter drive act to change the strength 
of instrumental performance in the direction 
opposite to the change in consummatory 
performance. For example, inconsistency in 
the direction of change for instrumental and 
consummatory performance has also been re- 
ported using a number of procedures designed 
to alter thirst (Heyer, 1951; Miller, 1956, 
1961; O'Kellp & Heyer, 1948, 1951). The 
assumption that vigor of the consummatory 
response reflects consummatory excitement 
thus provides incorrect predictions of instru- 
mental performance in a number of situations, 
There is, of course, no reason why a drive- 
induction theory or drive-reduction theory 
must assume 4 perfect correspondence be- 
tween changes in instrumental performance 
and changes in consummatory performance. 


PREPOTENCY THEORY 
Conceptualizing reinforcement as a relation 
between the reinforcing response and rein- 

Premack (1959, 1965) pro- 


forced response; r 

osed that any response may serve as a rein- 
ume and that the essential condition for 
€ 


inforcement is the higher free-performance 
probability of the contingent response than 
a ) 
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instrumental response. This generalization is 
inconsistent with drive-reduction theories of 
reinforcement (Hull, 1943; Miller & Dollard, 
1941, p. 30), which contain assumptions 
closely related to the “trans-situationality” 
assumption that a stimulus shown to rein- 
force a response will reinforce any other re- 
sponse capable of being reinforced (Meehl, 

1950). 

The assumption that any more probable 
response will reinforce any less probable re- 
sponse led to the following experimental dem- 
onstrations. To show that a consummatory 
response could itself be reinforced, Premack 
(1959) first recorded the amounts of candy 
eating and the pinball-machine manipulation 
by children in a period during which both 
behaviors could be freely performed. In ac- 
cord with Premack’s theory, children with a 
higher free-performance probability of ma- 
nipulating than eating subsequently increased 
eating above free-performance level for the 
Opportunity to manipulate and showed little 
change in manipulation for the opportunity 
to eat. Also, those children with the higher 
probability of eating than manipulating sub- 
sequently increased manipulating for the op- 
portunity to eat and showed little change in 
eating for the opportunity to manipulate, 
Further, using a Cebus monkey, manipula- 
tory responses were shown to reinforce other 
less probable manipulatory responses, but not 
more probable manipulatory responses (Pre- 
mack, 1963). And to show that the reinforce- 
ment relation is reversible for the individual 
animal, parameters were manipulated to make 
activity-wheel running by rats more probable 
than drinking, and using the same subjects 
at another time, drinking more probable than 
running. In both situations, upon instituting 
a contingency, the more probable response 
reinforced the less probable response (Pre- 
mack, 1962). 

Miller (1963) pointed out a serious flaw in 
Premack’s theory. Miller argued that the 
theory makes the incorrect prediction that 
tats should press a bar which turns on intense 
shock and thus elicits some highly probable 
Prepotent response (e.g., crouching). Since 
Crouching behavior fits the requirement of 
being elicited with high probability in the 
free-performance situation, the theory does 
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indeed make such a prediction. It is thus dM 
that Premack's view of positive reipleite M 
simply as a relation between any less probal le 
instrumental response and any more probos 
contingent response incorrectly predicts hig i 
aversive stimulation to be rewarding. Te 
jection is obviated by assuming that -— 
a class of responses (“appetitive respons! 2 
for which Premack's generalization halos 
a class of responses (“aversive respons 
for which it does not. 

Recent research has indicated t 
higher free-performance probability © 
contingent response than instrumenta o 
sponse may not be a necessary pu & 
reinforcement. Eisenberger et al. (196 is te- 
ing humans with manipulatory response? 


w- 
; s, 10 
ported that under some circumstances. 


hat the 
the 


re- 


E -emforce hist 
probability responses would ipis condi- 
probability responses. The essentia mpl 


tion for reinforcement appeared to be ve 
the organism's necessity of increasing ci 
mental responding if it is to one to the 
tingent responding as close as possib v this 
free-performance level. However, eVe 


„y [ol 
A sary 
condition does not seem to be agent 

; : m a P 
reinforcement. For example, in à I » 


mentioned experiment of Premack's (kin 
rats were rewarded for licking 4 © ctivit 
tube by the opportunity to run in ue 5 E eel 
wheel, Five licks unlocked the BEDV pre 
for 10-second periods. Examination asym?” 
mack’s data reveals that each anima 8 p 
totic lick rate was three to five etn eve? 
than the free-performance lick "4 of Jicke 
though the free-performance number pave 


: s rum : you e 
if optimally distributed over time, WO, th 


g " 
been great enough to maintain wet tind 
free performance level. Tt will be : vea am 
to see whether future research will EID p 
condition necessary for reward tearing ay 
yond the simple stimulus-reward Ha i 
sumed sufficient by many contemp 
centive theories. 

DISCUSSION got 


pose to, 
Most of the theories discussed SUPP' il 


s 0: f 
Some sort of stimulus variation 5 rer spon 
at least in part, for the magnitude the eis 
tory-curiosity behaviors and Um tion- T ah 
value of the related evoking stimu conc P 
Teview has pointed out some of the 
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empirical, and methodological issues which 
Seem most basic at this time for testing and 
extending these theories. 
A The concept of organism-experienced varia- 
tion is very inclusive. Compatible with most 
of the theories, for example, is the notion that 
an unfamiliar stimulus having few perceptible 
elements may afford more variety than a very 
familiar stimulus having a somewhat greater 
number of perceptible elements. On the other 
hand, application of the variation concept is 
ambiguous for many situations. For example, 
the confusion of attempting to apply Glanzer’s 
(1958) theory to situations in which animals 
face a choice between a heterogeneous but 
familiar stimulus and a homogeneous but un- 
familiar stimulus extends to all the other 
Variation theories, Dember and Earl (1957) 
attempted as a solution to the definitional 
Problem the determination of the organism's 
ÜNDectancies from the attentional and pref- 
— behaviors. Because of insufficient theo- 
‘cal elaboration, however, strict adherence 
9 this approach allows the Dember-Earl the- 
E to do no more than explain, after the fact, 
«tually all the behavior the authors intend 
to predict, 
tjs Ong the more general empirical issues 
eel to current theorizing are the follow- 
os (4) Does the degree of sensory varia- 
Phe One sensory modality or stimulus di- 
in On affect the reward value of variation 
‘her sensory modalities or stimulus di- 
Sd (b) How does the degree of re- 
Suc Y experienced sensory variation S 
the ees as eating and pod p 
tive er are dominant? (c) How do s 
the ^ States as hunger, thirst, and pain affect 
reward value of sensory variation and 


ff 
doni the performance of dominant and non- 
(qd) San exploratory-curiosity behaviors? 


vas. OW do the reward value of stimulus 
uo and the magnitude of paire 
Short Y behaviors change as a function o 
dition erm and long-term preadaptation con- 
homo Which range from great stimulus 
Stim sity (or very low intensity) to great 
Sit 5 heterogeneity (or very high inten- 
With ^. (€) How general is the finding that 
ifie tinued exposure to a set of stimuli 
re e in variation, any change in pref- 

€ Will be an increased preference for 
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more variable stimulation? (f) Given a suf- 
ficiently long duration of forced exposure to 
a preferred heterogeneous stimulus, will the 
organism come to prefer a relatively unfamil- 
iar but homogeneous stimulus? (g) How do 
approach and avoidance tendencies to a novel 
stimulus become changed as a function of the 
durations of stimulus presentation and with- 
holding? (4) What is the relationship between 
unexpectedness and resultant anxiety, and 
how does anxiety affect the reward value of 
unfamiliar stimuli and affect the magnitude 
of exploratory curiosity behaviors? 

The reward value of many types of non- 
need-reducing stimulation is considerably 
weaker than that of the traditionally studied 
rewards, food and water for deprived animals. 
When several conditions of non-need-reducing 
stimulation are compared, care thus must be 
taken to differentiate differences in perform- 
ance due to reward from differences evoked as 
unlearned or previously learned reactions to 
the situation. That differences in stimulus 
contact may not, in general, be isomorphic 
with differences in reward value is suggested 
by a number of such instances found for 
need-reducing rewards. Essential controls are 
most notably lacking in many studies of loco- 
motor exploration and free-operant light-con- 
tingent performance. 

Another problem is the apparent lack of 
concern by many experimenters studying sen- 
sory variation with the interpretability of 
their results in terms of current theory. For 
example, many recent investigations of the 
effect of differential sensory variation during 
early rearing upon later exploratory behavior 
have continued to employ rate of maze-unit 
traversal as the sole dependent variable 


despite the difficulty of separating out those com- 
ponents of the situation which are stressed in con- 
temporary learning theory: (a) the cue or signal or 
stimulus-complex which evokes a learned act, (b) 
the behavior being acquired, and (c) the events 
serving to reinforce the behavior [A. K. Meyers, 
personal communication, March 1971]. 


The conclusion by an experimenter that some 
independent variable has, or has not, a gen- 
eral effect on “exploration” is of limited util- 
itv when the dependent variable cannot be 
readily interpreted using contemporary theo- 


ries. 


e 
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A few of the theories stress the organism’s 
overt physical action as a basic component of 
reward (Premack, Sheffield). That a variety 
of stimuli have been found rewarding with 
the peripheral response mechanism bypassed 
suggests that the generality of application of 
these theories is limited. Nevertheless, there 
is no reason to assume in advance that theo- 
rizing which stresses the reward value of overt 
behavior will not be productive. Unfortu- 
nately, the response-oriented theories dis- 
cussed do not seem to have been very success- 
ful. Thus, in place of an earlier emphasis on 
arousal produced by overt consummatory re- 
sponding, Sheffield has more recently empha- 
sized consummatory activation at the level of 
the central nervous system. 
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PSYCHOLOGICAL SIGNIFICANCE OF PUPILLARY MOVEMENTS 


BRAM C. GOLDWATER ? 


University of Victoria 


Pupillary dilation, the light reflex, and 
have been used as dependent variables 
view of these studies provides evidence f. 
index of autonomic activity in psycho; 
problems in the pupillary literature are 


research are suggested. 


Changes in pupil size are under the control 
of two smooth muscles in the iris. The 
sphincter pupillae, located in the stromal 
layer, is under cholinergic control, mediated 
via parasympathetic nerves from the Edinger- 
Westphal nucleus. The dilator pupillae, situ- 
ated posterior to the constrictor muscle, is in- 
nervated by adrenergic fibers originating in 
the superior sympathetic ganglion. This set 
of opposing muscles exercises a fine but ex- 
tensive control over the pupil; pupil diameter 
can range from 1.5 to more than 9 milli- 
meters in man, and can react to stimulation 
in as little as .2 seconds (Lowenstein & 
Loewenfeld, 1962). Although pupillary move- 
ments, particularly the light reflex, have been 
studied for many years, their usefulness as 
indexes of psychological phenomena has re- 
cently begun to attract particular attention 
among psychologists. The aim of this study 
is to indicate, by means of a review of the 
literature, the ways in which pupil size may 
be employed as a dependent variable in psy- 
chological research and to examine some of 
the difficulties involved, 

Interest in the pupil as a psychophysiologi- 
cal variable has centered largely on pupil- 
lary dilation, and these studies are reviewed 
first. A few investigations have been con- 
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spontaneous fluctuations in pupil size 


in psychological investigations. A re- 


or the effectiveness of the pupil as an 
physiological research. Methodological 


discussed, and directions for further 


5 ical Vi 
cerned with the effects of psychological ME 
ables on the light reflex, while others pud 
examined spontaneous fluctuations in 
size. 


PuprL.LARY DILATION "o 
Pupil dilation independent of changes A 
lumination is scarcely a new phenom! 
having been recognized as early as agree 
(Loewenfeld, 1958, p. 204). Complete qs Oi 
ment as to the underlying —_— 
these movements, however, has yet mpre 
reached (see Loewenfeld, 1958, for a versies 
hensive review of the issues and gonia the 
involved). Lowenstein and Loewenfeld, 958; 
basis of extensive research (Loewenfeld, 
Lowenstein & Loewenfeld, 1950a, t pupil 
1962), have attributed psychosensors y a 
dilation to four mechanisms: active hr. 
thetic pathways to the dilator muscle: linge!" 
hibitory mechanism acting upon the for n- 
Westphal nucleus—the reflex center mecha- 
striction; and two adrenergic humora 
nisms, one adrenal epinephrine, an el 
nonadrenal adrenergic substance. 9 Ab) 
dence (Schaeppi & Koella, pecu up! 
suggests that the physiological basis lex 
lary phenomena may be more comp 
this. 


Affect, Emotion, and Attitudes pilla" 


Lowenstein (1920) reported tha duc 
dilation accompanied such suggestio Jeastt® 
states as “excitement,” “comfort, B 
and “displeasure,” as well as Su stu 
impending pain and threat. P " 
volved only one subject—appare v, n 
tonic schizophrenic—whose pup! 4 
ments were observed with the p 
Berrien and Huntington (1943); ! 
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relatively early study, 
eter by adjusting cross 
telescope, Measuring pupillary responses in an 
experimental  lie-detection situation, they 
found pupil dilation in response to some 50% 
of the critical words, as compared to about 

15% of the neutral words. 

Recent interest in the pupil among psy- 
Chologists is due in large part to the work of 
€ss and his associates at the University of 
hicago. In the first of these studies, Hess 
and Polt (1960) measured pupillary changes 
to pictorial stimuli in four men and two 
Women, Pupillary responses were scored in 
terms of the percentage increase in pupil size 
to a test slide over a preceding control slide. 
he authors claimed to have processed the 
Slides in such a way as to minimize bright- 
Ress contrast within pictures and to attempt 
to keep brightness constant across control and 
test slides. Women, on average, gave larger 
dilation responses to pictures of a baby, 
Mother and baby, and male nude, while men 
Showed greater “responses to a picture of a 
male nude, In another study, Hess, Seltzer, 
Shlien (1965) found that pupillary re- 
Sponses discriminated between heterosexual 
and homosexual males. Four of the five homo- 
Sexuals showed a greater pupil dilation to 
Pictures of male nudes than to pictures of 
“male nudes, while all five heterosexuals’ re- 
Donses were greater for female nudes. Hess 
. reported that dilation responses to 
Pictures of food differentiated between food- 
vePriveg and non-food-deprived subjects, with 
wDOnses of the former being “more than 
t 9 and a half times larger" than. those of 
in getter. Hess has interpreted pupil viae 
«; these kinds of situations as an index o 
Crest,” “emotion,” and “motivation. He 
Vani successfully employed it as a — 
as ee in conjunction with such dimensio d 
“Ste (Hess, 1965. Hess & Polt, 1966) an 
usica] Preference (Hess, 1965). a 
sp. a investigators have followed E de 
tract Pupil size to measure interest i : 
Stim veness with respect to different PEE 
Dupim" Fitzgerald (1968) reported gr "i 
hum aty dilation in infants to pictures zi 
Datte faces than to geometric shapes ani 
Ths 1 f the mother's 

face » and to a picture o K 
aS compared to that of a stranger. Ko 


measured pupil diam- 
hairs on a short-focus 


and Hawkes (1968), presenting photographs 
of classmates to sixth graders, tried unsuc- 
cessfully to correlate pupillary responses with 
sociometric ratings. Simms (1967), in a study 
of the pupillary reactions of males and fe- 
males to pictorial slides of males and females, 
found a significant interaction of sex of 
viewer with sex of slide, indicating a greater 
pupillary response to slides of the opposite 
sex. Their results also demonstrated that sub- 
jects dilated maximally to opposite-sex slides 
with enlarged pupils, and least to like-sex - 
slides with enlarged pupils. This latter finding 
was interpreted in terms of the assumption 
that pupillary dilation in others is perceived 
as an indication of interest. Bernick, Kling, 
and Borowitz * found a significant correlation 
between percentage of change in pupil size 
and the amount of reported erection occurring 
during viewing of an erotic film by males 
with heterosexual histories. Heart rate and 
corticosteroid levels, which were also mea- 
sured, showed less systematic changes, and 
correlated neither with pupil size nor with 
reported erection. . 

Color, reputed to evoke emotional re- 
sponses, was used as an independent variable 
by Miller (1966), who compared pupil changes 
in response to red, blue, green, and gray 
slides of equal intensity. He found that color 
slides were rated as more emotional and, fol- 
lowing an initial constriction, evoked sig- 
nificantly greater pupil dilation than the 
neutral gray slide. The use of color as an in- 
dependent variable seems rather unfortunate, 
since its physical and psychological effects 
may have been confounded; it has been dem- 
onstrated that, other things being equal, the 
wavelength of a light stimulus can influence 
pupil size (Bouma, 1962). " 

A number of attempts have been T e to 
employ pupil size as an index of pre 2 

ith a view to the possible application o 
a illography to marketing research. Krug- 
€ (1964) correlated pupil size - serus 
and sales ratings of greeting cards and sterling 
silver patterns. Only the correlation between 
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ranked pupil size and sales rank of the silver 
patterns reached significance, possibly re- 
flecting the low level of affect attached to 
these stimulus materials. In another study 
(Krugman, 1965), pupil size was found to 
differentiate between two television commer- 
cials, but no corresponding differences were 
found in verbal ratings. 

Hess (1965) claimed that while “uninterest- 
ing” or “boring” pictures had no systematic 
effect on pupil size, “distasteful” or “unap- 
pealing” stimuli (e.g., pictures of cross-eyed 
children) were associated with constriction. 
Stimuli highly aversive in content (e.g., pic- 
tures of corpses in a concentration camp), 
though initially causing pupil dilation (and 
concomitant galvanic skin responses), re- 
portedly evoked pupillary constriction on re- 
peated exposures, This notion of constriction 
responses to aversive stimuli runs counter to 
the prevailing opinion that emotional reac- 
tions, regardless of their affective quality, 
tend to elicit predominantly sympathetic ac- 
tivity (in this case, therefore, pupil dilation). 
Loewenfeld (1966) has asserted that “all 
psychologic and sensory stimuli, with the ex- 
ception of light, dilate the pupil and none of 
them contract it [p. 294 |." 

Woodmansee (1965) attempted to replicate 
Hess's constriction findings. In a carefully 
controlled study, he categorized 22 college 
co-eds as "equalitarian" or “anti-Negro” ac- 
cording to their scores on a multifactor Racial 
Attitude Inventory, and compared their pupil- 
lary responses to “racial content" photo- 
graphs. Postexperimental ratings showed that 
the pictures evoked greater dislike from the 
anti-Negro subjects. The pupillary responses 
of the two groups differed in the predicted 
direction only on the first of eight trials, and 
here, although the average response of the 
anti-Negro females was in the direction. of 
constriction, it departed only minimally from 
zero, and was not significant. In a second 
study (Woodmansee),* 14 co-eds viewed a 


photograph of the scene of a brutal murder 
recently committed on cal 
sponses of all subjects, 
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felt “repulsed, disgusted, or sickened” by the 
picture, consisted of dilation, with the excep- 
tion of one nonsignificant score in the other 
direction. ; 
Other investigators have found pupil dila- 
tion to affective stimuli, but have, in gen- 
eral, provided little support for Hess's find- 
ings of constriction to negative stimuli. 
Peavler and McLaughlin (1967) found sig- 
nificantly greater pupil size in response to @ 
iemale nude, as compared to other more 
neutral pictorial stimuli. Pupil size did e 
however, differentiate among words varying 
on rated dimensions of either good-bad or 
neutral-very important. Vacchiano, Bru 
Ryan, and Hochman (1968) measured ne 
size during visual exposure to words of hig 
neutral, or low value on the Allport-Vernol 
Lindzey scale. The one index of pupillary 
response that significantly differential 
among value categories indicated greater + 
size to low-value words than to neutral x- 
high value. Changes in pupil size upon ie 
posure of a low-value following a high-v4 E 
word as frequently consisted of dilation iat 
constriction. Polt and Hess (1968), mene 
pupillary reactions to the visually puse n 
words “hostile,” “squirm,” “flay,” and ape 
found dilation and constriction m 
equally divided among 15 subjects. The Pam 
sibility of an individual difference factor "y 
derlying these results could not be enm 
because of the absence of any reports of ws 
sistency within subjects. Collins, ee 
and Helmreich (1967) had subjects raie 
eral verbal and pictorial (both visual) St pre 
on eight evaluative, three activity, and tial- 
potency scales of the Semantic Dues ds 
A significant pooled, within-class end 
between pupil size and semantic diffe! tency 
ratings was obtained only for the m 
dimension. Nunnally, Knott, Duchnows" J 
Parker (1967) found significantly 8 
Dupil size during viewing of a slide om- 
girl in states of partial undress, 
pared to a slide of her fully clothed. qurin? 
investigators also compared pupil siZ e 
exposure of pictorial slides rated on ides wer 
ness dimension. “Very pleasant” sli upi di- 
associated with significantly greater P un 
ameter than either neutral or “Ve 


y 

yen? 
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pleasant” slides, but the neutral a! 
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unpleasant pictures did not differ significantly 
from each other. Nunnally et al. (1967) also 
examined pupillary changes in anticipation of 
4 previously experienced loud gunshot (the 
threat of gunshot was attached to the third 
of a series of five numbered slides). The pupil 
showed clear increases in diameter as the 
slide series approached the critical third slide, 
followed by a return to initial size. Given the 
intensity, and therefore presumed noxious- 
ness, of the gunshot this finding is in con- 
tradiction with Hess’s prediction. Finally, 
Guinan (1966) found emotional word slides 
to be associated with significantly greater 
Pupil size than neutral ones. The three emo- 
tional stimuli, consisting of the words “vomit,” 
Sex,” and “kiss,” did not differ among each 
Other. Hess's hypothesis received apparent 
Support from a study by Hutt and Anderson 
(1967), who obtained independent measures 
of pupil size and tachistoscopic recognition 
threshold for emotional and neutral word 
Slides, Subtraction of the neutral word pupil 
Size and threshold values from the correspond- 
ing emotional word values yielded a differ- 
ence score on each measure. The correlation 
"tween these difference scores was small but 
Significantly less than zero—a result in line 
With the authors! hypothesis that perceptual 
lense in response to emotional material is 
Mediated by pupillary constriction. However, 
ho independent evidence is provided to show 
at subjects actually showed pupillary con- 
“triction to the emotional words. Lehr and 
ergum (1966) also found differences in line 
With Hess's data; pupil size associated with 
oy pleasant" pictorial slides was found to 
De Slightly but significantly greater than that 
'Ssociated with “unpleasant” slides. The au- 
‘ors, however, did not present evidence for 
COnstriction to the unpleasant stimuli, nor did 
ey: find any significant correlation between 
“Pil size and verbal ratings of pleasantness- 
4pleasantness. Furthermore, because of miss- 
Cate ata, pupil size was compared across n 
in. S9rles of very pleasant and unpleasant, 
ied of between very pleasant and ~~ r1 
fert thus confounding the effects zi =" 
ie of affect with those of ape ae 
fate A study that did report clear findi = 
by pe With Hess’s predictions was carried ou 
arlow (1969). Pictorial slides of Lyndon 


p 
u 


Johnson, George Wallace, Martin Luther 
King, and an unknown white were presented 
to individuals classified as liberal or conserva- 
tive on the basis of interview data. The 
groups differed significantly with respect to 
pupillary reactions across the three slides of 
known political figures, with liberals demon- 
strating dilation responses to Johnson and 
King and constriction to Wallace, and con- 
servatives showing the opposite pattern, 

On the basis of experimental findings alone, 
one must question Hess's hypothesis, since 
the bulk of the evidence fails to support his 
position. Along with this experimental qu 
however, is a methodological issue that has 
critical implications not only for the Hess 
controversy but for pupillometric research in 
general. This has to do with the doubtful 
validity of using pupil size as a measure of 
psychological reactions to visual stimuli. 

The primary function of a mobile pupil is 
to regulate the amount of light entering the 
eye and to make adjustments that enhance 
visual acuity. It follows that pupillary changes 
may be elicited by the purely physical as 
well as the psychological characteristics of 
the same visual stimulus and that these two 
effects may be hopelessly confounded. Several 
facts suggest that Hess's findings were open 
to confounding in this way. Woodmansee 
(1966), employing a stimulus prepared ac- 
cording to Hess's technique, found significant 
pupillary constriction with shifts in gaze from 
darker to brighter areas of the picture. This 
points up the vulnerability to confounding 
even of intrastimulus comparisons, with in- 
dividual differences in pupil size being at- 
tributable to differences in point of regard, 
Woodmansee further suggested that a dis- 
tance of 3-4 meters is required to minimize 
the subject's difficulty in maintaining his 
focus on the stimulus and thus to avoid ac- 
commodation (pupillary constriction) effects; 
Hess typically employed a distance of about 
80 centimeters (Hess et al., 1965), and most 
of the above investigators, including Barlow 
(1969), have used Hess's apparatus as a 
model. Of particular interest is Hess's own 
report that constriction responses were found 
onlv with visual stimuli; unpleasant-tasting 
liquids and disliked musical selections con- 
sistently evoked dilation (Hess, 1965). 
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Such methodological criticisms apply to any 
study in which pupillary responses are em- 
ployed to measure the psychological impact of 
visual stimuli. It is difficult to escape the con- 
clusion that visual stimulation is inappropri- 
ate in this type of pupillometric research. 
Loewenfeld (1966) indicated the extent of the 
problem by pointing out that factors of 
brightness, color, area, retinal distribution, 
and accommodation may all be involved in 
the pupillary reaction to a visual image. 
Other variables affecting the pupillary light 
reflex further complicate the picture (Lowen- 
stein & Loewenfeld, 1959, 1961b). The claim 
that these influences can be effectively con- 
trolled is open to serious doubt, particularly 
in the case of complex pictorial or three-di- 
mensional material. The validity of all of the 
above studies, which employed visual stimula- 
tion, may thus be questioned, and caution is 
advised in any commercial application of 
pupillography. It is clearly desirable, wherever 
possible, to restrict the stimuli used in pupil- 
lary research of this kind to nonvisual mod- 
alities. A rather ingenious example of such 
an approach is the suggestion by Nunnally 
and Rileigh * that one measure pupillary re- 
sponses to the anticipation of particular 
classes of visual displays, rather than to the 
displays themselves. This would circumvent 
many of the problems involved with visual 
stimuli, At the same time, it should be re- 
membered that regardless of the modality 
employed, care must be taken to maintain a 
constant level of illumination and a constant 
fixation point for the subject. 


Sensory Stimulation and Muscular Activity 


Stimulation of all Sensory nerves is re- 
ported to evoke pupil dilation. 


It was recognized at an early period of physiological 
research that stimulation of any sensory nerve in 
the body will elicit bilateral pupillary dilatation. The 
reaction is so sensitive and reliable that it was often 
Used as an indicator for sensation in physiological 
experiments [Lowenfeld, 1958, p. 277]. 


Stimuli employed by Lowenstein and Loewen- 

F Nunnally & W. J. Rileigh. Pupillary re- 
sponse in relation to anticipation of emotion-provok- 
ing events. Paper presented at the meeting of the 
American Psychological Association, Washington, 
D. C, September 1967. 


BRAM C. GOLDWATER 


feld to evoke pupillary dilation have included 
a pistol shot, sudden barks, abrupt blowing 
into an animal's face, a pinprick, and a 
squeeze of the tail (Loewenfeld, 1958). Pupil 
dilation to painful stimuli has been frequently 
reported (Gellhorn, 1953; Kuntz & Richins, 
1946; Lowenstein, 1920; Ury & Gellhorn, 
1939). : 
Pupil dilation has been reported as one O 
the components of the orienting reflex, the 
response to novelty or stimulus change. Raz- 
ran (1961), reviewing the Russian literature, 
stated that pupil dilation is “almost invari- 
ably” the first reaction to nonvisual stimula- 
tion. It has even been claimed that, on rare 
occasions, the pupil will dilate to a sue 
increase in illumination, overriding the naa 
light reflex (Sokolov, 1963, p. 99). Nunna ^ 
et al. (1967) examined pupil size in ee 
tion with two sets of “novel” and “non 
pictures. Pupil diameter associated with v " 
sets of novel stimuli was larger than for m 
corresponding nonnovel stimuli, although d 
difference reached significance only for t 
first set. e. 
Clynes (1962) investigated the “auditory 
pupil reflex," employing as stimuli 500 CYC a 
per-second tones and clicks of moderate ed 
low intensity. Averaging over 20-100 d 
sponses, he described the dilation responses 
small in amplitude (about .5 millimeters’? 
with a latency as small as .15 second, à p 
of dilation reached in about 1.5 seconds, dd 
a subsequent recontraction phase that wen 
as long as 20 seconds. In apparent b i 
tion with orienting response theory, a 
reported no fatigue or habituation of the ut 
sponse after several hundred trials. It 35 (he 
clear whether or not this may reflect 


s E eing 
method of scoring used, habituation ker 
evaluated in terms of blocks of 50 some 


there might conceivably have been with 
diminution of response amplitude, vd the 
Spontaneous recovery, within blocks (Slows 
discussion of “overextinction” in $05 

1963, p. 120). A second anomaly has Turf 
with Clynes’s statement that the PUP!” %4 
response to moderate sound “appears duce 
Specific reaction and its form is not pr0C cte 
by other stimulation such as an unexp® 

touch stimulation of the hand (Clynes: 


"Lora tut 
D. 836).” The orienting response lite" 


| 


Seems to imply that the orienting reflex is 
uniform across modalities. And Loewenfeld 
(1958) stated explicitly, 
depre stimuli, intense emotions and spon- 
ilie: rne EU aitece ule normal pipil precisely in 
motor appárdhgs en Yo lled int P m a 
these modes of stimul: E. "- id ES n dm Ca 
fesulting etera Def ik the features of the 
S xes are alike [p. 321]. 
Unfortunately. there is insufficient data on 
pou of the pupillary reflexes to different 
either of stimulation to adequately evaluate 
er of these positions. 
a e tally et al. (1967), comparing pupil 
oF Pie an ascending and descending series 
és Our tone intensities, found some evidence 
à both habituation and intensity effects. 
heir mode of analysis did not, however, per- 
ch evaluation of phasic pre- to poststimulus 
T in pupil diameter. Beck ° found evi- 
and e for a relationship between pupil size 
With Intensity of auditory clicks, although, 
v three different intensities, the relation- 
D was curvilinear, maximal pupil size being 
Clated with the intermediate intensity 
pt He also demonstrated a relationship be- 
n pupil size and rate of click stimulation 
ma wey closely approximated the curve re- 
thy th Voltage and i frequency of cortical 
noise i The author inferred that background 
Se from experimental equipment could be 
Source of artifact in pupillary research and 
9uld be controlled. 


Has Vidence has been presented for pupil dila- 
sio; 45 a concomitant in induced muscle ten- 
tym Nunnally et al. (1967) presented graphi- 
Weigh dence for pupil dilation in response to 
; oat lifting, with greater dilation occurring 


With į 
h creasing weights. 


M 
ental Activity and Attention 


the Wenstein (1920) claimed that dilation of 
fre Pu can be observed “with every 1n- 
gen in attention by intellectual processes 
“ry kind [p. 194].” 
anges in pupil size as a function of “men- 
Ss i were studied by Hess and Polt 


9 — 


A aW H ity 
f audit, Beck, The effect of the rate and intensity 
Dresen ‘tory dick stimulation on pupil size. m 
logic at the meeting of the American Psycho- 


1967 Association Washington, 
i ) E 
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(1964), using four verbally presented multi- 
plication problems of varied difficulty. Graphs 
of pupil size for each of the five subjects 
typically showed a gradual dilation, beginning 
with the presentation of the problem and 
reaching its peak immediately before the sub- 
ject's verbal report, followed by constriction 
to original size. The mean extent of dilation 
was directly related to degree of problem dif- 
ficulty. 

Subsequent studies have provided repeated 
evidence for the ability of pupil size to re- 
flect mental activity and to differentiate among 
different levels of task difficulty. Investiga- 
tions have been carried out using mental 
arithmetic (Bradshaw, 1967, 1968b; Payne, 
Parry, & Harasymiw, 1968; Schaefer, Fergu- 
son, Klein, & Rawson, 1968), continuous pro- 
cessing tasks (Bradshaw, 1968a), reaction 
time (Bradshaw, 1968c), recall tasks (Elsh- 
tain & Schaefer, 1968), and psychophysical 
judgments (Kahneman & Beatty, 1967). 

Kahneman and his associates have carried 
out a series of tightly designed studies using 
pupil size as an index of "processing load," 
following a paradigm in which pupillary 
measurements are carefully time locked to 
paced performance of mental tasks. These 
experiments have demonstrated a character- 
istic response pattern of the pupil, consisting 
of dilation as material is presented to the 
subject for processing, and constriction as his 
report of the solution signals completion of 
the task or trial. This type of function was 
originally described for short- and long-term 
memory tasks (Beatty & Kahneman, 1966; 
Kahneman & Beatty, 1966) and for a digit- 
transformation task, in which subjects had to 
add 1, 2, or 3 to each of four digits, and then 
recall the transformed series (Kahneman & 
Beatty, 1966). f 

In a series of follow-up studies, Kahneman 
and his colleagues have attempted to build 
up construct validity for the assumption that 
pupil size is specifically sensitive to the de- 
cree of mental effort in these tasks. Kahne- 
man, Beatty, and Pollack (1967) had sub- 
jects perform a digit-transformation-and-re- 
call task and a visual detection task, both 
separately and simultaneously. The authors 
predicted that performance of the digit task 
simultaneously with the detection task would 
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increase the processing load on the subject, 
resulting in both an increase in pupil size and 
a decrement in the detection performance. 
The results showed, as predicted, that per- 
formance on the detection task was signifi- 
cantly better, and pupillary dilation smaller, 
when this task was performed alone. More- 
over, a parallel between the amount of pupil- 
lary dilation and perceptual performance was 
evident under the simultaneous condition, 
with the greatest number of detection misses 
occurring at that point in the digit series 
where pupillary dilations (and, presumably, 
mental effort) was maximal. Although per- 
formance on the digit task was inferior under 
the simultaneous condition, pupil size did not 
discriminate in this case. Kahneman, Onuska, 
and Wolman (1968) were able to demonstrate 
an effect of grouping upon pupillary dilation 
in a short-term memory task. The essentially 
monotonic increment in pupil size occurring 
during presentation for recall of an ungrouped 
series of digits was supplanted by a scalloped 
function, with pupil size increasing during 
presentation of a group of digits and con- 
stricting between groups. These differences in 
pupillary function corresponded to the differ- 
ences in pattern of rehearsal reported by sub- 
jects under the two conditions, 

Data from other experiments were used to 
Support a distinction between effort and 
arousal as processes underlying changes in 
pupil size. Kahneman and Beatty (1967) 
measured pupil size during a pitch-discrimina- 
tion task, in which subjects had to discrimi- 
nate, on each trial, between a “standard” tone 
of constant value and a “comparison” tone 
which varied in frequency. Pupillary dilation 
was significantly greater in response to the 
Comparison tone which, as the authors argued, 
Presents the critical information that the 
subject must actively process. The standard 
tone, which was repeated on every trial and 
which, as a result, required less attention 
from the subject, elicited little change in 
Pupil size. The principle effect on pupil size 
was thus attributed to the processing load, 
Seale “an organism is placed . . | by 
Wenéraied direst ie opposed to the arousal 
menan d z 5 the stimulus itself. 
Pupil size duri oe A ay montor 

ng an association learning task 
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involving digit-noun pairs. Trials were pre- 
sented under conditions of either high or low 
reward, contingent upon the odd or PUER 
value of the digit stimulus. On the study 
trials, subjects showed significantly greater 
dilation to high- as compared to low-reward 
nouns; there was no such discrimination in 
the case of responses to the digit stimulus 
items. Since the stimulus items signaled the 
reward value, while the nouns actually had 
to be memorized, these results were inter- 
preted to mean that pupillary responses Were 
a function of mental effort rather than emo- 
tionality or arousal. A correlation of .92 be- 
tween an index of differential pupil dilasa 
and one of differential recall of high- anc 
low-reward items also supported this assump- 
tion that the greater dilation to highirews 
nouns was due to the greater effort expende 
in learning them. " 
This work of Kahneman and his assoc! rà 
represents an attempt to go beyond the T id 
demonstration of an effect of mental enr 
upon pupil size. They have, through i 
analysis, sought to uncover those aspect? se! 
the subject’s performance that in fact wal 
flected in pupillary responses, thus valida E 
their conception of the critical interven!? 
variable. "T 
Krueger? photographed the pupils "ask 
subjects during a visual recognition ve 
Stimuli consisted of 20 silhouettes of saa 
objects with portions removed to render Hof 
ognition more difficult. He found a correla +f 
of .85 between pupil size and recognition. | a 
ficulty. Gibney * measured pupil see re 
vigilance task, in which subjects had tion 
spond verbally to the momentary hesita p a 
any of three clock pointers. MagnituC" pe 
dilation displayed at the beginning amber 
task was significantly related to the e als 
of subsequent correct detections. It Wa whe? 


ates 


: not 
a signal was presented, whether i yo 
latter was reported (magnitude o 


;d no 
. id n 
to reported versus nonreported signal 


sd 


imuli ? 
* L. M. Krueger. Pupillary responses to pe 
varied difficulty levels. Unpublished master $ 
Purdue University, 1966. (Abstract) an 
3T. K. Gibney. Vigilance performance 
tonomic nervous system activity- 
master’s thesis, Purdue University, 1966- 
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differ significantly). Galvanic skin response, 
Which was also employed in this study, did 
Dot appear to discriminate as well as pupil 
Size, 
, Hakerem and Sutton (1966) recorded pupil 
Size during a detection task involving report 
of the presence of a near-threshold (per- 
ceived 50% of the time) light stimulus. 
Curves of pupil diameter for trials in which 
the light was not seen, for “blank” trials in- 
Volving a light at one-tenth threshold energy, 
and for nondiscrimination trials in which no 
differentia] report was required were all simi- 
lar, Consisting of an increase in pupil size, ap- 
barently reflecting a response to instructions. 
hose trials in which the stimulus was re- 
Dorted as seen, however, showed a substan- 
tially larger dilation. Effects of the motor re- 
Sponse employed for the subject’s report were 
Controlled by counterbalancing; for one-half 
© trials the subject pressed a key to signify 
Perception of the stimulus, while for the re- 
maining trials a key press signified his failure 
to perceive, i 
Problem that arises in interpreting the 
above studies of mental processes concerns 
the possibility that a motor factor might 
underlie some of the pupillary changes. 
ampos and Johnson have reported (Campos 
jo Johnson, 1966, 1967; Johnson & Campos, 
" ) that both the act of verbalizing and 
^e Preparation to verbalize can effect con- 
Istent increments in autonomic activity. All 
E the above studies involved some type of 
noa Motor response, verbal or otherwise, 
at might have contributed to the results. . 
v "Dson and Paivio have examined this 
eg ble In a series of studies exploring the 
E Ect on pupil size of generating images to 
gg, att and concrete words (Paivio & [4 
1963 1966, 1968; Simpson & Paivio, 1966, 
dic, ^", The results of these investigations ee 
D “ated that the imagery task reliably evoked 
fedillary dilation, but that a differential ef- 


dilatio abstract and concrete words (Geni 
jects 9n for abstract) occurred only when s i 
a ad to make some overt response (e. 
ad Press) to signal task fulfillment. = 
ing attempts to specify the factors under » 
tha, .°S¢ pupillary responses suggested (4 
Tele lé motor responses must bear some 


Vance to the mental task in order to exer- 
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cise an effect (Simpson, 1969) and (5) that 
the effect of the motor response might have 
to do with the demand it imposes upon the 
subject to make a decision (Simpson & Hale, 
1969). 

More work is required in order to define 
more precisely the mode of influence of overt 
responses upon pupil size in these contexts. 
Meanwhile, great care must be exercised in 
interpreting these studies. The evidence in 
Simpson and Paivio's work for an interaction 
between the overt response and the concrete— 
abstract dimension indicates the need for par- 
ticular caution. Thus, Hakerem and Sutton's a 
(1966) counterbalancing control, whereby a 
key press signified detection of the signal on 
half the trials and failure to detect on the 
other half, was not entirely adequate, since 
the response might have had a differential ef- 
fect under the two conditions. Separate analy- 
ses of an effect of the key-press response for 
detected and nondetected trials would have 
checked out this possibility. A finding with 
particular significance for this problem is 
Kahneman and Peavler’s (1969) report that 
pupillary dilation on paired-associate test 
trials were equally large whether or not the 
subject responded verbally. This result, to- 
gether with that of Simpson and Hale (1969), 
suggests that an effort variable correlated with 
the overt response, rather than the response 
per se, might be the critical factor. One might 
easily speculate that the requirement to 
overtly signal the completion of a mental 
task could readily influence the degree to 
which the subject actually performed that 
task conscientiously. Such an assumption 
could account for Simpson and Paivio’s find- 
ings in their imagery studies. On the other 
hand, it seems unlikely that Kahneman and 
Peavler’s (1969) subjects made any less of an 
attempt to recall the appropriate response 
item on those trials where they failed to ver- 
balize a response. It is thus not surprising 
that the overt response would not have been 

in this case. 
ee most of the above studies con- 
trolled for accommodation effects through the 
use of a fixation point for the subject, the im- 
portance of these controls should again be 
stressed. Kahneman and Beatty (1966) found 
a 10% increase in pupil diameter when chang- 
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ing from a fixation distance of 6 inches to one 
of 6 feet. Great care should thus be taken to 
minimize the chance of the subject shifting 
his fixation point or “blurring” his vision. A 
fixation point at optical infinity is likely ideal. 


LIGHT REFLEX 


The pupillary reflex to an increase in light 
intensity is bidirectional, consisting of con- 
striction followed by redilation. The initial 
constriction, beginning with a latency of 
about .2 seconds (Lowenstein & Loewenfeld, 
1962, p. 236), comprises a rapid “primary 
contraction phase of about .4 seconds dura- 
tion, followed by a slower “secondary” con- 
traction period of about .3 seconds (Bartley, 
1951, p. 979; Gradle & Ackerman, 1932). 
This is followed by a redilation, fairly rapid 
at first (“primary redilation wave”) then 
diminishing in speed (“secondary redilation 
wave”) (Lowenstein & Loewenfeld, 1950b). 
When the light stimulus is brief, this redila- 
tion ultimately restores the initial pupil diam- 
eter; with a prolonged increase in illumina- 
tion, pupil size levels off at some value in- 
termediate between the initial diameter and 
that reached at the peak of contraction. The 
precise parameters of the light reflex will vary 
as a function of both stimulus and organismic 
properties, but as a rule the reflex tends to 
display this same general form (see Lowen- 
stein, Kawabata, & Loewenfeld, 1964, for a 
list of many of the variables affecting the re- 
sponse). Like the dilation response, the light 
reflex is virtually identical in the two eyes 
under normal conditions, whether or not both 
eyes are stimulated. This adds considerably 
to the ease of measurement, 

The efferent fibers for the light reflex are 
parasympathetic nerves originating in the 
Edinger-Westphal nucleus and reaching the 
sphincter pupillae via the ciliary ganglion. 
It has already been pointed out that various 
neural and humoral factors, through their in- 
fluence upon both the dilator pupillae and 
the parasympathetic center in the Edinger- 
Westphal nucleus, effect pupillary dilation. 
Tt is to be expected, therefore, that under 
the appropriate conditions of sensory and 
psychological stimulation, these same factors 


would exert inhibitory effects upon the light 
reflex, 
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Lowenstein (1920), examining the light re- 
flex under the experimental conditions de- 
scribed above, detected, with the naked eye, 
a reduction in speed and magnitude of the 
response under the suggestion of fear, and 
pain due to pressure. : 

Bender (1933) reported that presentation 
of an “emotional” stimulus (e.g., a gunshot, 
electric shock, presentation of a white rat) 
before or during a 5-10-second light stimulus 
tended to reduce the speed and increase m 
latency of the light reflex. An emotionā 
stimulus presented after the light stimulus 
tended to induce oscillatory activity in "s 
pupil. Pain stimuli and alcohol had generally 
inconsistent effects. TE 

Gardner (1937) found the light reflex. A 
be more extensive during speech than s 
silence in stutterers, with no comparable C1 
ference in normals. However, since gen 
caused pupillary dilation in the stutterer 
group, it is possible that the greater mak 
nitude of contraction was due to differences ! 
initial diameter. 

Rubin (1960) found pupil size as 
during the response to light to be sm i 
under normal conditions than under dus 
(cold-pressor test). The factor of initial cn 
eter was, however, not examined. He aed 
compared the extent of pupillary contract! 
in psychotics and normals, He found spat tly 
of 25 psychotics showed either significa, 
greater (15 of the 18) or less constrictio 
than normals, using the mean and stani m 
deviation of the latter group as a Bet pez 
comparison. No measures of individua tudy 
liability were provided, however, and à 5 ith, 
by Stilson, Haseth, Schneider, Wa 
Rogers, and Astrup (1966) failed to rep criti- 
the results. This latter study has beer blo 
cized on methodological grounds by 
and Barry (1968). . 

Holmes (1967) found that subjec 
demonstrated a faster light reflex were in 
likely to be aware of the contingencies. pe 
verbal conditioning task, and condit ese 
more readily, The author interpreted e 
results in terms of the hypothesized s a 
acetylcholine in learning, the amount 9 pet 
yicholine presumed to be indexed bY $ o 
of constriction, In view of the comple ary 
the mechanisms in both learning and PUP! 


plotted 
aller 


ts who 
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responses, this interpretation may well be 
oversimplified. 7 

In a series of articles, Lowenstein and 
Loewenfeld (1950b, 1951, 1952a, 1952b, 
o) described the effects of various psycho- 
O98!cal and physiological variables on the 
"ug of the light reflex, They employed, for 
cie part, a l-second light stimulus of 

derate intensity (about 15 footcandles). 
Clinical evidence from humans and experi- 
Mental results on cats, monkeys, and rabbits 
agreed in all major respects. 

The authors found, in general, that pre- 
Sentation of a light stimulus in temporal 
Proximity to an intense sensory or emotional 
Stimulus, or during a state of strong excite- 
Ment, produced a light reflex of characteristic 
/ or W shape—with slower and less exten- 
Sive constriction and a redilation phase that 
AM be faster and on occasion premature. 
While electrical stimulation of the cervical 
Sympathetic chain interfered only minimally 
hie the light reflex, stimulation of the pos- 
erior hypothalamus abolished it entirely. 
“sions to this hypothalamic area effected a 
ight reflex referred to as "tonohaptic" in 
* üpe—with decreased latency, rapid but 
quickly ending constriction, and rapid redila- 
ys beginning with the termination of the 
s t stimulus, Partial lesions to the para- 
chine thetic pathway at the level of the 

ary ganglion or Edinger-Westphal nucleus 
“creased the speed and extent of pupillary 
ntraction to light. 

n the basis of evidence of this kind, 
ay Venstein and Loewenfeld (1950b) have 
alyzed the light reflex in terms of the in- 
“action of sympathetic and parasympathetic 
*chanisms, concluding that the “dynamic 
s tute” of the response “depends on the 
Pi duration and timing of coinciding 
L Pathetic and parasympathetic impulses 

Owenstein & Loewenfeld, 1950b, p. 373]. 
m attributed the modification of the lighe 
S bo Under states of excitement or sensory 
h "lation to the action of the posterior 
YPothalamus, 
aye Wenstein and Loewenfeld (19522. eal 

,lhvestigated the process of “fatigue” 0 
‘ght reflex—the weakening and changing 
i "à Of the response with its repeated elicita- 

~as well as the phenomenon of “psycho- 
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sensory restitution"—the strengthening of a 
previously fatigued or less-than-optimal light 
reflex by sensory stimulation. Their findings 
have led them to postulate different person- 
ality “types” according to the rate and man- 
ner in which an individual's light reflex mani- 
fests fatigue symptoms (Lowenstein & Loew- 
enfeld, 1951). Their notions as to the basis 
of these personality types in the relative 
strength and dominance of sympathetic and 
parasympathetic centers, and their tentative 
descriptions of the attendant psychological 
traits displayed by the different groups, have 
interesting implications for both personality 
study and for the general process of fatigue. 

Lowenstein and Loewenfeld's work has con- 
tributed to a better understanding of the 
mechanisms underlying the pupillary light re- 
flex, while at the same time pointing to useful 
and exciting applications of the measure. 
More research is necessary to understand the 
precise manner in which the sympathetic sys- 
tem exerts its influence upon the response 
to light, particularly in the case of processes 
like psychosensory restitution. A problem that 
Lowenstein and Loewenfeld's studies may 
present for the psychological reader is the 
general absence of reports of variability and 
reliability. This makes it difficult to assess 
directly the power and stability of some of 


their findings. 


PUPILLARY UNREST AND HiPPUS 


The pupil, particularly in bright light, is 
alwavs in motion. These oscillations, referred 
to as *pupillary unrest" or “hippus,” depend- 
ing on their severity, are of two kinds: slow 
waves which, in darkness, last from about 4 
to 40 seconds and measure up to an am- 
plitude of .5 millimeters; and superimposed 
faster and smaller fluctuations of about .5—1- 
second duration and .1-.3-millimeter extent, 
under the same conditions (Lowenstein, Fein- 
berg, & Loewenfeld, 1963). 

Lowenstein et al. (1963) have suggested 
that the slower, larger waves of pupillary con- 
traction and dilation might be attributed to 
central states of “arousal” and “fatigue.” 
These fluctuations in pupil size are more ac- 
centuated under fatigue or drowsiness and in 


cases of chronic, nonorganic fatigue. They 
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coincide with changes in muscle tone, respira- 
tion, and heart rate, and may be increased or 
decreased in frequency and extent by central 
depressants or stimulants. Light reflexes 
elicited during periods of large fluctuations of 
this kind show the characteristic “excitement” 
or “fatigue” forms, depending on whether 
they are evoked during high (“aroused”) or 
low (“drowsy”) phases of the movement 
(Lowenstein et al., 1963). These spontaneous 
movements of the pupil may thus provide a 
further index of psychological activation, 

The smaller, faster pupillary movements 
appear to be primarily parasympathetic in 
origin. The authors Suggested that they may 
be related to the subject’s inability to “fixate 
evenly [Lowenstein et al., 1963, p. 142].” 


CLASSICAL CONDITIONING oF PUPILLARY 
RESPONSES 


The possibility of conditioning the light re- 
flex has been the object of controversy for 
over 40 years. In the earlier literature on the 
subject there are but three reports of suc- 
cessful conditioning of pupillary constriction 
to light (Baker, 1938; Cason, 1922; Hudgins, 
1933). Each of them involved a mechanical 
pupillometer, the use of which instrument has 
been described as “an extremely difficult task 
which requires the utmost concentration on 
the part of E [the experimenter | [Young, 
1954, p. 62].» Attempted replications of these 
experiments outnumber the original studies 
themselves. (Crasilneck & McCranie, 1956; 
Hilgard, Dutton, & Helmick, 1949; Hilgard, 
Miller, & Ohlson, 1941: Hilgard & Ohlson, 
1939; Steckle, 1936: Steckle & Renshaw, 
1934. Wedell, Taylor, & Skolnick, 1940; 
Young, 1954, 1958), Despite compulsively 
faithful duplication of the original procedures 
ànd apparatus in Some of these replications, 
a more accurate modern measur- 
Ing devices in Others, none of these attempts 
succeeded in demonstrating conditioning. 


Young (1958) gave an unusually detailed 
critique of both Baker? i 


ave been report 
Fitzgerald, Lintz, 


; and Adams 
(1967) and Brackbill, Lintz, a rald 


nd Fitzgerald 
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(1968) claim success in conditioning Los 
the light and darkness reflexes of the pupil. 
Two aspects of these studies are somewhat 
unusual. First, the experiments were con- 
ducted with infants. Second, in both of these 
studies conditioning was successful with a 
temporal conditioning paradigm, but not he 
an auditory conditioned stimulus (65-decibe 
tone). The possibility that the tone in same 
way inhibited conditioning is contradicted by 
the fact that conditioning was demonstrated 
with a combined temporal and auditory condi- 
tioned stimulus. The fact that one of the 
experimenters reportedly held the eo 
head during conditioning suggests mo 
complications. Fitzgerald and Brackbill Saee 
attempted to condition the light and cce 
reflexes in adults, using different onde pm 
intervals. They reported that although > 
results were not significant (though in Ne 
right direction), at least one of the es 
subjects in each of the six experimen a 
groups conditioned. It is not clear whether o 
not the data for these individual subjects ii 
evaluated statistically. Studies by peor 
Hakerem, and Mantgiaris (1969) wee we 
(1968) report, on the other hand, no evide di- 
for conditioning using light as the uncon 
tioned stimulus. : ii 
Conditioning of pupillary dilation wi A 
shock, as opposed to a photic uncondition i 
stimulus, has, in contrast, been demonstr 
a number of times. Harlow (1940) and ing 
low and Stagner (1933) reported condition 
of dilation to a bell in cats and doss. igel. 
with and without injection of curare. wo 
mass et al. (1969), although unable to nai 
onstrate conditioning with a light aca 
tioned stimulus, were successful when 


a 


his 
ploying auditory stimulation. Gerall er this 
associates published several studies tion 


kind, demonstrating conditioned pupil ^ 
in humans (Gerall, Sampson, & Boslov; arize 
Gerall & Woodward, 1958) and in d d 
(flaxedil) and noncurarized cats Laer in- 
Obrist, 1962). The two studies on TA con- 
cluded an unconditioned-stimulus-only ist 
trol. Of some interest in the Gerall and s was 
(1962) and Gerall et al. (1957) studie ight 
the fact that dilation to offset of “ition? 
stimulus was unsuccessful as an uncondi 
response, 
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The Weight of the evidence suggests that 
photic stimulation is not an effective uncon- 
ditioned stimulus in conditioning the pupil. 
If in fact, as a minority of studies report, the 
light or darkness reflexes are amenable to con- 
ditioning, there is a need for the elucidation 
of some necessary conditions that will ac- 
Count for the many failures in this area. 

he apparent ease with which the dilatory 
response to nonphotic stimulation can be 
conditioned supports a commonly stated as- 
Sumption that some type of motivational 
Component is a necessary requirement for 
Classica] conditioning. 


DISCUSSION 


There is ample evidence for the sensitivity 
and Potential usefulness of the pupillary sys- 
tem as an index of various states of activation 
or arousal, There is a need, however, in 
Pupillometric research as in other areas, for 
definitive studies of the basic parameters of 
response, 

Methods for quantifying the form charac- 
teristics of the pupillary responses, together 
| With a determination of their stability both 


Within and across subjects, would facilitate 
research in this area. This is particularly true 
m Connection with work on the light reflex, 
where Sreat emphasis has been placed upon 
Variations in the shape of the response—its 
Speed, extent, etc.—as a basis for inferring 
p Ysiologica] processes and for comparing and 
S assifying individuals. Lowenstein and Loew- 
“nfeld have made use of differential curves in 
order to clarify the speed characteristics of 
* Pupillary response. Tt would be useful to 
Specify some standard criteria by which varia- 
tions in the shape of the response might be 
ìdentifieq in a relatively objective manner. 
us Criteria would facilitate the comparison 
Tesponses across subjects and conditions, 
Would make it easier to assess the degree 
ih Which individuals manifest TEN n 
no Íorm of their pupillary reactions under 
"mal conditions. 
et Variable that has received too little p 
that on in studies of pupillary a n 
cepted, Prestimulus diameter. It is a we cn 
ap, ;, Principle in psychophysiology that th 
are tude of an autonomic response Js n 
à function of prestimulus or “initial 
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level (“law of initial value"). Beck (see Foot- 
note 6), referring to his own Work, has stated 
that pupil size is a “critical variable in the 
determination of the form of the photopupil 
reflex [p. 4]." Feinberg and Podolak (1965), 
in a study of the pupillary light reflex in rela- 
tion to aging, reported a negative relation 
between latency of response and original 
pupil size. The possibility that some of the 
findings regarding the form of the light reflex 
have been confounded by variations in initial 
diameter is difficult to evaluate, given the 


general lack of explicit reference to this P 


variable. 

Any thorough evaluation of the pupil as a 
psychophysiological measure must include 
comparative studies with other autonomic 
variables. There is some evidence for correla- 
tions of pupillary dilation with respiration 
under hypothalamic stimulation (Ranson & 
Magoun, 1933); with cardiac acceleration in 
response to voluntary breathing changes 
(Harsh, Beebe-Center, & Stevens, 1939); and 
with heart rate and skin potential during 
voluntary pilomotor activity (Lindsley & 
Sassaman, 1938). A recent study by Kahne- 
man, Tursky, Shapiro, and Crider (1969), in 
which pupil size, heart rate, and skin re- 
sistance were plotted during a digit-trans- 
formation task, revealed strikingly similar 
changes in all three systems, with the pupil- 
lary data being most consistent. Colman and 
Paivio (1969) monitored pupillary activity 
and the galvanic skin response during an 
imagery task. The authors found that pupil 
size, but not galvanic skin response, differen- 
tiated significantly between concrete and ab- 
stract words, leading them to infer that pupil 
size was the more sensitive measure of cogni- 
tive activity. Studies by Bernick et al. (see 
Footnote 3) and Gibney (see Footnote 8), 
already referred to, suggested a possible su- 
periority of pupil size over heart rate and 
corticosteroid levels, and galvanic skin re- 
sponse, respectively. With these exceptions, 
there seem to be few if any rigorous studies in 
which the pupil and other autonomic re- 
sponses are systematically correlated or com- 
pared as to their power to discriminate among 
different stimulus conditions. . 

The pupil has certain distinctive features 
which, other considerations aside, are perti- 
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nent to its use as a psychophysiological vari- 
able. Although not unique in being recipro- 
cally innervated by the sympathetic and 
parasympathetic systems, the pupil is dis- 
tinctive in that it provides, by means of the 
light reflex, at least a partially autonomous 
index of parasympathetic activity. This may 
allow for a better understanding of such 
things as individual differences in pupillary re- 
sponse; Lowenstein and Loewenfeld's work on 
personality types (Lowenstein & Loewenfeld, 
1951) is an example of such an application. 

Unlike, for example, the circulatory and 
respiratory systems, the pupil is not involved 
in crucial vegetative functions, making it 
Somewhat easier to evaluate and control its 
responses. Furthermore, pupil size can be 
directly measured by means of photographic 
or related techniques. This makes interpreta- 
tion and scoring more straightforward than in 
the case of such variables as blood volume or 
skin conductance. 

The inherent difficulties in using the pupil 
to assess the psychological impact of visual 
stimuli have already been discussed. Pupil 
dilation presents a further methodological 
problem. Given the ceiling effect of a maxi- 
mally dilated pupil, it would seem reasonable 
io assure a small initial pupil diameter by 
measuring the pupil under conditions of mod- 
erate to strong illumination. As has already 
been mentioned, however, increasing illumina- 
tion is accompanied by greater instability of 
the pupil, and thus by added error variance. 
As a result, the measure may have less than 
optimal power. The light reflex is free of this 
type of difficulty, but, at the same time, it is 
not a continuous measure, nor can we yet be 
sure whether the relevant changes in form of 
the response are sufficiently stable and quan- 
tifiable, Research is thus needed to evaluate 
both the relative merits of the different pupil- 
lary responses, as well as to assess the use- 
fulness of the pupil in comparison with other 
response systems. 
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LEAST SQUARES APPLICATION OF LEVINE'S HYPOTHESIS 
TO MISSING REWARD SEQUENCE SITUATIONS 


WILLIAM J. THOMSON ! 


Vanderbilt University 


Levine has proposed two methods of application of his hypothesis behavior model 
to three-trial object-discrimination learning-set data. Method I uses a subset of 
equations for parameter estimation and is applicable to mis 
data, but the parameter estimates are not least square estimates. Method II 
realizes least square estimates but is inapplicable to missing reward sequence data. 
The method described in this article derives least square estimates for 9 of the 15 
possible three-trial missing reward sequence situations. Application of this method 
to two sets of data indicated that reliable parameter estimates may be obtained 
from as few as two reward sequences, and these parameter estimates were found 


to have adequate predictive power. 


Levine (1965) has described a hypothesis 
model of problem-solving behavior and two 
methods of application of the model to object- 
discrimination learning-set data. The model 
assumes that a naive subject samples exactly 
one of nine hypotheses at the start of cach 
three-trial object-discrimination learning-set 
problem and uses that single hvpothesis until 
the problem terminates. For e ample, one hy- 
pothesis available to the subject is position 
preference, and it is assumed that this hypothe- 
Sis is sampled with Probability a. Thus, with 
Probability a, the subject will consistently re- 
spond to a single position for the duration of 
the problem. To estimate the hypothesis proba- 
bilities from the data, a set of 32 equations, one 
Íor each possible reward sequence-response 
outcome combination is derived. Each of these 
equations represents an equality between the 
sum of the probabilities for all hypotheses lead- 
ing to observations in a reward Sequence- 
response outcome cell and a proportion for that 
cell estimated from the data. 

Given these 32 basic equations, Levine 
(1965) illustrated two methods of application 
of the model to object-discrimination learning- 
Set data. One method of estimation is to take 
9 (or more) of the 32 equations and solve for 
the nine parameters (Method I). The disad- 
BÉ — 

‘The author wishes to thank Marvin Levine and 


Lewis Bettinger for making available the data analyzed 
in this article. 


Requests for reprints and for the listing of inverses 
should be sent to William J. Thomson, Department of 


Psychology, Vanderbilt University, Nashville, Tennes- 
see 37203. 


MODEL 


g reward sequence 


vantages of this method are twofold: 


1. Reasonable rules of inclusion and exclu- 
sion of equations must be formulated; 
2. The estimates of parameters are not least 
Square estimates. 


It should also be noted that all of the data ave 
not used in estimation, a disadvantage if maJor 
interest is in the value of the parameters, m 
an advantage if additional data are dese 
for assessing the predictions of the model. 
Levine also described a least squares metho 
of parameter estimation that uses all of ie 
data and leads to straightforward algebra!c 
equations for each parameter (Method ID- 
However, this method also has two draw- 
backs: 


, 1 Since all of the data are used in estima 
tion, none remains for prediction; by 

2. The particular equations derived jr 
Levine (1959) become indeterminate if all ín 
of the reward sequences do not occur in à Pe" 
ticular experiment, 


The remainder of this article describes 
least squares method of parameter estimate” 
that may be applied to missing rewa" 
quence situations. In this method all of the 
for particular reward sequences are USCC) yr- 
all reward sequences do not have to arit 
Thus, least Square estimates may be pesa se- 
for experiments that omit certain rewa" 
quences, or least square estimates mày sed 
obtained from certain subsets of data and " 
to predict remaining data. 
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Response Outcome 


E --- | | =p | h +++ t+—- ] pap [b= 
i e | 
E] AAA acfr Par bdr € par ac € por r bd py fr 
E - 
S 
AAB cr à fpr der b par € par | aer d f pir br 
E 
= | 
$ ABA| bcer | bar adr | [per bc der | r ad pyr er 
Ea | | 

ABB| cr ETT djr | apr | epar | bjr de py ar 


Fic. 1. Hypothesis probabilities as a function of reward sequence and response outcome. 


The derivations involved in this procedure 
are quite similar to those described by Levine? 
and are sketched only briefly. First, the nine 
available hypotheses and associated probabili- 
ties of sampling are: 


. l. Position preference (a): constant respond- 
ME toa single position. 

2. Position alternation (b): successive alter- 
Nation of response between positions. 

3. Stimulus preference (c): constant respond- 
'Ng to a single stimulus object. 

4, Stimulus alternation (d): successive alter- 
nation of between two stimulus 
Objects. 

E Win-stay-lose-shift with respect to posi- 

€ (e): responding to the same position after 

oed responding to other position after 
reward, 

tion (55e stay-win-shift with respect to iei 

hon; Js responding. to the same position alter 
eward; responding to other position after 


response 


reward 

( a, Win-stay-lose-shift with respect to object 
a responding to the same object after re- 
r ey responding to other object after non- 
ward, 
re 8. Third trial learning (ps): responding cor- 
r 1d 9n the third trial following an incorrect 

3ponse on Trial 2. 

Pott, Residual category (R): idosyncratic hy- 
to 368 treated as a single hypothesis leading 
"quivalence of all response sequences. 

y i i 5 1 rep- 
resent Levine's (1965) notation, let AAA m 

& reward sequence in which the rewarde 


9b: 
ect . 
Het is always on the same side. Then ABA 
2 — 
M : M 
Ginnie Levine, A model of hypothesis behavior in dis- 
tion ation learning set. Unpublished doctoral disserta- 


!chlversity of Wisconsin, 1959. 


represents alternation of the reward, and so on, 
for a total of four possible reward sequences. 
Next, let +; and —; denote correct and in- 
correct responses, respectively, on trial i. Then 
the theoretical probability of the joint occur- | 
rence of each response outcome with each re- 
ward sequence may be enumerated in terms of 
the hypothesis probabilities. The theoretical 
probabilities applicable to each reward se- 
quence-response outcome cell are shown in 
Figure 1. 

To estimate the hypothesis probabilities 
Írom data, note that, for example, in reward 
sequence ABA, 


P(—1-F» —3/ABA) 
= P(—/ABA) 


(a + d + r), 
where r = R/4. Then 


P(+2 —3/A -1 BA) =a+d+r. 


But 


s _ n(A —1B+2A —;), 
P(t —a/A — BA) 777.0, BA) 
where x (S) is the frequency of Sequence S. 
Thus, 


n(A —1B+2A —s) _ : 
~ co o- BA) =atd+r, 


In similar fashion a total of 32 equations may 
be obtained, 1 for each cell in Figure 1 (see 
Levine, Footnote 2, for details). Since n (S) is 
observable from the data, estimates of the 
hypothesis parameters may be obtained. Con- 
sider estimates based only on the reward 
sequences ABA and ABB. (Similar derivations 
apply to any subset of reward sequences.) One 
may list all of the equations containing the 
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Parameter a: 
a + d + r= Qa,a 
à d- d or p: +r =Q 


a + pe +r = Qis 
a+r= Qis, 


where Q;,; is the observable proportion for cell 
(i,j). Summing, 


4a + 2d + po+ ps 4 dr 
= Qs,3 + Qu "E Qua T Ons = Qa. 


A similar equation may be obtained for each of 
the nine parameters. Thus, for any reward se- 
quence subset, one has a set of nine equations 
in nine unknowns, which may or may not have 
a unique solution, By restructuring these equa- 
tions in matrix notation, the existence of a 
solution and the solution itself may easily be 
determined. Let P represent the column vector 


of parameters and Q represent the column 
vector of Q; values, such that 


^ c8 
c 
CO 
` 


O 
= 


Rpp 
— 
OO 


J 1 7 
Then let C represent a 9 X 9 matrix, each row 
of which is composed of the coefficients of the 
9 parameters for one of the 9 simultaneous 
equations. That is, Row 1 for ABA — ABB is 
4,0,0,2,0,0,1,1,4 since the first equation for this 
reward sequence subset is: 
4a + 0b + Oc + 2d + 0e + 0f + 1p; 

+ Ips + 4r = Qu. 
Then the system of linear equations may be 
represented by 

CXxXP-9Q. 


Multiplying by the inverse of C, C+, 


C?XCXxP-C-x9Q. 
Then, 2 


IXP=Cx9Q, 


Wiete I is the identity matrix. Thus a unique 
Solution to the system of equations exists if 
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and only if C~ exists. The proof that this solu- 
tion is a least squares solution is identical to 
Levine (pp. 86-88, see Footnote 2) and is not 
given here. io ; 

Of the 15 possible subset combinations O 
three-tuple reward sequences, C^! was found to 
exist for the nine subsets of sequences found in 
Table 1. Inverses did not exist for C in any st 
the single reward sequence subsets, nor for the 
doublet subsets AAA-ABA and AAB-ABB. 
(By appropriate restrictions on the param- 
eters, solutions could be found for these e 
of equations, but such restrictions are not ej 
cussed here.) Thus, for example, the presen 
method would not be applicable to an expen 
ment containing only reward sequences AM: 


and ABA. 


Test or THE METHOD 


: stimation was 
This method of parameter estimation 


applied to object-discrimination Lease Ti 
data for 30 young rhesus monkeys bw eal: 
Harlow, Harlow, Rueping, and Mason a + acts 
In this experiment, three groups ee ai 
were defined by initiating training at 60, ooh 
120 days. Within each of these groups, ene 
eter estimates were obtained over every : se 
of 1,000 problems for each of the Lay d 
quence subsets in Table 1, for a total A j 
blocks per group. In order to estimate as 
reliability of the parameter values from PS et 
data, the correlations between the pur 
estimates for the total data (i.e. AAAA: for 
ABA-ABB) and the parameter estimate’ ice 
cach of the eight remaining reward seq ee 
subsets was calculated for every block, an* 


1$ 

mean of these correlations for each group 
given in Table 1. i f vari- 

Next, a statistic called “proportion s cal- 
ance explained" (Levine, 1965, p. 10) b is 
culated to assess the predictive qualities rianct 
method of estimation. Proportion of vil 
explained (PVE) may be defined as: 


2 
g'oP 


PVE-1—-— 


ao 
where 
op = E(0 — py? N, 
P = predicted proportions, 
O = obtained proportions for same 
-V = number of cells predicted, 


! 


cells; 
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TABLE 1 
IEAN CORRELATIONS AND PROPORTIONS OF VARIANCE ExPLAiNED (PVEs) For MissiNG REWARD SEQUENCES 


Harlow et al. | Harlow et al. Harlow et al. i 
n Group 60 | Group 90 Group 120 Bettinger et al. 
Sequence set 

F z | z PVE r | PVE r PVE r PVE 
I AAA AAB ABA ABB | 1.000* - 1.0008 | — 1.000» — 1.000% — 
2. — AAB ABA ABB -987 | 912 .990 945 .998 O74 994 844 
ABA ABB | .989 | 833 987 .995 911 .986 648 

AAB ABB | 1985 | -760 982 | .993 „811 .995 778 

9. AAA AAB ABA | 1987 | .795 985 994 900 994 .758 
6. AAA AAB | .935 570 972 | 973 824 885 436 
T. AAA ABB .968 763 956 | | .969 806 .969 .663 
s à AAB ABA | .970 .868 .972 | .988 .929 .988 .823 
9, ABA ABB | .848 -602 -772 | 873 .786 9357 .676 


^C w 
^ distrained to be 1.000. 
icates data from a reward sequence that was not 


€ 9*0 is the variance of the obtained values 
he predicted cells. Thus for the AAB-ABA- 
Used Sequence subset in Table 1, 24 cells are 
par; to estimate parameters. Using these 
2 lad estimates, the proportions for the 
t remaining cells are predicted, and the 
Proportion of variance explained is calculated 
ee eight predicted proportions and the 
Mea, Corresponding observed proportions. 
blocks Qe e of variance explained over 
dE: senence subsets is sive 
"able Lo all sequence subsets 1s given im 
a s method of parameter estimation was 
Set E apa, to object-discrimination learning- 
" tata for speciosa macaque monkeys from 
a toe? Anderson, and Meyer.’ There was 
Or ; ^l of 196 problems for each of 11 subjects 
s k- total of 2,156 problems per block. Param- 
Has a nates were obtained for every = of 
quen i ah each of the nine allowable reward = 
tien subsets, and correlations between a 
ach na estimates for the total data = 
ulat missing reward sequence subset were cn - 
i». ed. Proportion of variance explained was 
y calculated for each block of trials. 
€x lage BB for r and proportion of variance 
ed are given in Table 1. . 
Varia verall values of both z and proportion 
> ance explained for all data were consist- 


a : 
T Bens MEVS 
Bettinger, R. Anderson, D. R. Meyer. Discrimi- 


nati 
Dero Wasi-discrimination, and delayed response 
aper read 


a TMance i f 
at i ance in brain operated monkeys. Paper 
logg “Western Psychological Association, Chicago, 


included in the least squares estimate of the parameter set. 


ently high, comparable to those reported by 
Levine (1965) in evaluation of his Method I 
and Method II parameter estimation proce- 
dures for total data. With the exception of the 
idiosyncratic behavior of Sequence Subsets 6 
and 9 discussed below, no systematic differ- 
ences in r or proportion of variance explained 
were noted between sequence subsets. How- 
ever, both of these statistics increased slightly 
over blocks of trials within a sequence subset, 
since the reduction of residual responding (R) 
led to more reliable parameter estimates. 
Sequence Subsets 6 and 9 consistently led to 
values of proportion of variance explained 
which were low. The parameters with greatest 
alue (a, c, p», ps: late blocks; and R: early 
blocks) were the same for blocks in which pro- 
portion of variance explained was high as those 
for which proportion of variance explained was 
low, but some constant errors in direction of 
estimation were apparent. In particular, the 
estimate of a (position preference) from Se- 
quence Subset 6 was high, while the estimate of 
a from Sequence Subset 9 was low. Note that 
ais the only one of these five major parameters 
that is differentially reinforced for different 
reward sequences (i.e., only a appears in differ- 
ent columns in Figure 1). If one assumes that 
the values of parameters are determined largely 
bv the frequencies in the +++ column 
(which contained by far the most observations), 
the observed errors in estimation of a would be 
expected. Thus the low values of proportion 
of variance explained are peculiar to the diff- 


w- 
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erences in estimation of a from Sequence Sub- 
sets 6 and 9 within the same data. That is, one 
would not expect consistent errors for an ex- 
periment in which only Sequence Subset 6 (or 
9) was present. 

In summary, the proposed method of param- 
eter estimation contains all of the advantages 
of Levine's (1965) Methods I and II, without 
the accompanying disadvantages. The correla- 
tional analyses indicate that reliable estimates 
of parameters may be made from as few as 
two appropriate reward sequences, and the pro- 
portion of variance explained analysis shows 


THOMSON 


that such parameter estimates have adequate 
predictive power. 


REFERENCES 


Harrow, H. F., Harrow, M. K., RuErixG, R. R, & 
Mason, W. A. Performance of infant rhesus monkeys 
on discrimination learning, delayed response, DT 
discrimination learning set. Journal of Com parative | 
and Physiological Psychology, 1960, 52; 113-121. m 

Levine, M. Hypothesis behavior. In A. M. Schri ol 
H. F. Harlow, & F. Stollnitz (Eds), Behavior E 
nonhuman primates. Vol. 1. New York: Academ 
Press, 1965. 


(Received August 7, 1970) 


Psychological Bulleti 
b ulletin 
972, Vol. 77, No. 5, 361-372 


ON THE AMBIVALENCE-INDIFFERENCE PROBLEM IN 
ATTITUDE THEORY AND MEASUREMENT: 
A SUGGESTED MODIFICATION OF THE SEMANTIC 
DIFFERENTIAL TECHNIQUE ! 


KALMAN J. KAPLAN? 


Wayne State Unizersity 


This article exp! 


of the bipolarity-reciprocal antagonism issue. 5} 


of the semantic differential technique wherein the "liking" and 
ly measured. A geometrical model is developed 


ponents of attitude can be separate! 


in which th 
and “polarization’’) are distinguished fr 


and validity data are presented, and an 


S3 1935 review of the general area of 
seio aa and research, Gordon Allport 
Agreed na that most investigators basically 
0 res hat “attitude is a learned predisposition 
able iie to an object in a consistently favor- 
more i unfavorable manner Lp. 818]. à F urther- 
lien le suggested that this bipolarity in the 
Versu On of an attitude (i.e., the favorable 
istinc unfavorable) was viewed as its most 
5 ens feature. Thus, he felt that attitude 
dinner. conceptualized as a simple uni- 
iva Sional concept—the evaluative (or affec- 
€) dimension. 
roe of attitude theorists and re- 
nha aa including Allport himself, have been 
attache} with this unidimensional view and 
Critics f t as oversimplified. Basically, the 
Compo, all into two major, camps: (a) multi- 
Bone attitude theorists, who feel the 
icie ive dimension or component. is not à 
Compone definition of attitude; and (b) a 
eva “a attitude theorists, who feel that the 
itio ative dimension, though à sufficient defi- 
n of attitude, is not itself unidimensional. 
fro, id of the early controversy emanate 
of wh e first camp, focusing on the question 
ether cognitive and conative aspects 


‘This 


19594 5 paper was adapted in part from the author's 
G 


y of Illinois, 


r's thesi y - 
thesis conducted at the Universit 
ishbein and 


"Ward Tuck direction of Martin F 
h 3 
Reuben author would like to thank Joel Ager, 
Sadi, " cine Ira Firestone, and Cary Lic 
Uest, em drafts of this paper. ] 
I D or reprints should be sent to K 
“Were epartment of Psychology, Wayne 
Y, Detroit, Michigan 48202. 


Alan Bass, 
htman for 


alman J. 
State 


hree nondirectional attitude variables (“total 
om the usual attitude variable. Reliability 


lores the alternate meanings of attitudinal neutrality in the context 


pecifically, it proposes a modification 
“disliking” com- 


affect," “ambivalence,” 


application of the model is discussed. 
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must be considered in addition to evaluations 
for a workable definition of attitude (cf. 
Fishbein, 1963, 1965, 1967; Rosenberg, 1956, 
1960). The preponderance of current research, 
however, has emerged from the second camp 
(e.g, Komorita & Bass, 1967; Smith, 1961; 
Wiggins & Fishbein, 1969). Additionally, a 
number of studies have specifically been inter- 
ested in the implications of the bipolarity- 


reciprocal antagonism issue for the interpre- 
Bass & 


tation of attitudinal neutrality (e.g., 

1969; Green & Goldfried, 1965; 
, Mausner, & Snyderman, 1959). It 
ially and methodo- 
alence from indif- 
central concern of 


Rosen, 
Herzberg, 
is the attempt to conceptu 
logically distinguish ambiv: 
ference that represents the 
the present study. 


Some General Requirements for the Measurement 
of Ambivalence 


Despite its obvious theoretical importance, 
there exists a dearth of measurement pro- 
cedures specifically designed to tease out ambiv- 
alence from other attitudinal variables. To a 
large extent this has been due to the prevailing 


view of attitudes as à directional bipolar evalu- 
ative response representing, on the one hand, 
positive feelings, appraisals, and tendencies; 
and on the other hand, negative feelings, 
and tendencies (again, see the 
review article of Allport, 1935). Yet despite 
this view, researchers have often found the 
need to postulate nondirectional attitude vari- 
ables (e.g; intensity, affective involvement). 
As Scott's recent review of attitude measure- 


appraisals, 
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ment pointed out: 


The "directional" property raises, but does not auto- 
matically answer, the question as to how neutral atti- 
tudes are to be conceived. Are they midway between 
positive and negative attitudes? Should they be sub- 
divided into indifferent and ambivalent attitudes? The 
conception of favorable and unfavorable as “opposites” 
implies that persons will not be found with attitudes 
simultaneously at both ends of the dimension. Yet an 
alternative formulation might treat degree of favorable- 
ness and degree of unfavorableness as conceptually 
distinct (although no doubt empirically correlated) 
components, on which persons may make, simultane- 
ously, a variety of position combinations. In other 
words, it is only by convention that the direction of an 


attitude is conceptualized as a single bipolar attribute 
[Scott, 1968, p. 206]. 


In other words, the measurement of ambiv- 
alence seems to require a situation in which 
an individual has the opportunity to simul- 
taneously indicate both a favorable and an 
unfavorable attitude toward a given stimulus 
object. Such a technique represents a marked 
contrast to the typical measurement procedures 
discussed above which allow a respondent to 
make one and only one evaluative response to 
a given object. This overall response is then 
taken as an indication of either a favorable, 
a neutral, or an unfavorable attitude, and we 
are left with ambiguity in interpretation of neu- 
trality (i.e. indifference versus ambivalence). 

The argument might be raised that measures 
of ambivalence are actually obtainable from 
the usual directional procedure through the 
Separate consideration of items eliciting re- 
sponses that indicate favorable and unfavor- 
able attitudes (e.g., adjective checklist). This 
does not seem entirely satisfactory, however, 
since it is possible that individuals may in fact 
be ambivalent to each item and still indicate a 
favorable, neutral, or unfavorable response to 
it. Thus, the ambivalence problem is only re- 
moved one level, but not dealt with directly. 
A solution seems to demand a basic modifica- 
tion in the measurement technique itself. 


Ambivalence and the Semantic Differential 


The technique to be offered in this article 
represents a modification of the semantic dif- 
ferential (SD) scale (Osgood, Suci, & Tannen- 
baum, 1957). The SD scale is used as the point 
of departure for several reasons. For one, the 

technique is Probably the most popular, 


flexible, and easy to use of current measure- 
ment techniques. It offers to researchers à 
do-it-yourself attitude-measurement kit, easily 
transferable across situations. Second, the SD 
technique has been buttressed more than other 
current measurement techniques by a m 
developed theoretical model—a mediationa 
learning approach viewing semantic space hee 
bipolar and defining attitude as the bipo'? 
evaluative dimension. r . 

A concept is rated on the SD as follows: 


sa 23 (D) (9) X) Q0 @ 


: : ul re already bee? 
in which the scale positions have already 


T : : * erede ; as: 
defined for the subject in the instructions 


(—3) extremely X (3) extremely y 

(—2) quite X (2) quite Y — 

(—1) slightly X (1) slightly Y 

(0) neither X nor Y; equally X and Y 

a con- 


When an individual is asked to rate make 


cept on a SD scale, he is instructed s 
sith: 
only one check mark. He can respond W 


p : : - - NE. ME Y 
(-3 (-2 (-1) € ( Q0 © 
or . 
xau s i r : i =" % 
XC ES (-) @ 0 q O9 
indicate 
He cannot, however, attempt to hhecking 
ambivalence toward a concept by € 


both sides of the continuum: 


Ey í E i 
(-3) —2 Hp o9 a o A 


For Osgood's major purposes this rept 
no problem. He bases his theore 
pinnings for the SD technique on the 
tion of reciprocally antagonistic XY 
an object cannot be both X and. 
making the question of ambivalence 
In Osgood's own terms: 


First, since [italics added ] the polar t 
meaningful opposites, we assume thi 
characteristic of X will be reciprocally 
that characteristic of Y (i.e., wherever 
X is rm, the same component of Y will aii " 
versely). Second, since [italics added] as which met 
in subsequent chapters, scales are aec al th 
mize one factor or component and mini wie woh 
the 7, pattern elicited by an X-Y set x 57, P 
one dominant component [Osgood et at- 


a compon c 
be 7m: dx 
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Obviously, the assumption of bipolarity is 
Critical to the question of ambivalence. Un- 
fortunately, the restrictions of the single- 
marking SD instructions have made it im- 
Possible for this assumption to be adequately 
tested. Rather, because of the forced bipolar 
Structure of the SD scale, polar adjectives are 
arbitrarily required to be reciprocally antagon- 
Stic leaving no room for disconfirmation. 
Green and Goldfried (1965) have pointed to 
this problem and advocated the use of scales 
that do not force the reciprocal antagonism as a 
Means of determining whether or not semantic 
Space and individual semantic differential 
Scales actually are bipolar. 

The technique Green and Goldfried have 
Used involves the intercorrelation of sets of 
Single-adjective unipolar scales each con- 
Structed from the same bipolar adjectival SD 
"pm The specific assumptions and instructions 

MS single-adjective technique are discussed 
Shortly as a way of contrast with the technique 
ho posed in the present article. Basically, 
ay oe negative intercorrelations between 
> WO single-adjective scales are interpreted 
SS Indicating the bipolarity of these adjectives. 
Ns cally, they calculated 9,750 correlation 

.. Klents and found that certain adjective 
neps Puch as good bad) tended to be highly 
inen correlated with each other for 
tern, F concepts. However, as an overall pat- 
Conch ireen and Goldfried were forced to 

ide: 


ace were wide differences in occurrence Rud 

in the 9n concepts, There were also wide Bene 

Posites Occurrence of semantic differential pos ut 

fitra S scales. Any tendency for p A y 

Single g Istic adjectives to form the two poles A. 
Sine Was clearly dependent on the concept 
eno Pts involved and therefore wa snot a generaliz 
Menon [Green & Goldfried, 1965, p. 307. 


Green and Goldfried’s technique has been 


ecg, H 
Bron tly criticized by Heise (1969) on the 
"is that unipolar scale ratings are more 


q 


Ecte s x 
Cry Sted by the unintended denotative, periph- 


thus i fleeting aspects of adjectives 
douh RAN more sources of variance than bon 
th lesg ibre bipolar scale ratings. m 
Sca es à E eise concluded that a number of 5 
of Tue za that do not meet the Pa A 
: hari hity This conclusion has direc 
tons for the present article: single- 


marking SD instructions often force the sub- 
ject to perform a psychologically impossible 
task—to directionally evaluate a concept in a 
manner eliminating ambivalence when he is 
faced with adjective pairs he does not judge to 
be bipolar. 

The matter, however, is more complicated 
than has been suggested. Though Osgood et al. 
proposed a semantic theory in which opposite 
pairs are reciprocally antagonistic, they, in 
practice, allowed the subject a response through 
which he can try to indicate that he believes 
both sides of the scale are appropriate for his 
response to the concept. Consider the middle 
(neutral) category: it is alternatively defined 
as "neither X nor Y" or “equally X and Y." 
In other words, a check in this “neutral” cate- 
gory can indicate either (a) neither X nor Y 
is an appropriate response to the concept (i.e., 
feelings of indiflerence exist toward the con- 
cept); or (b) both X and Y are equally appro- 
priate responses to the concept (i.e., feelings 
of ambivalence exist toward the concept). 

Through their dual definition of the neutral 
category, Osgood et al. have built a safeguard 
into their instructions to allow their usage 
whether or not XY pairs are perceived as 
reciprocally antagonistic. Their safeguard, how- 
ever, has been exacted at the price of clarity of 
interpretation of results not only at the neutral 
category but at every category on a SD scale. 
Allowing an ambivalent “neutral” response in 
the context of single-marking instructions 
produces a fundamental change on the inter- 
pretation of the SD instructions. Specifically, 
it serves to transfer the quality of reciprocal 
antagonism from the XY pairs to the instruc- 
tions themselves.’ In effect, subjects are not 
rating the concept on the XY scale but are 
instead rating the “net” (Y minus X) qualities 
of the concept. If the Y qualities are dominant, 
the Y half of the scale is checked. If the X 
qualities are dominant, the X half of the scale 


is checked. . 
The implications of this analvsis for attitude 
measurement are critical. That a subject ma 
indicate ambivalence through a single mark 
allows him to rate concepts on scales whether 
or not the defining XY adjective pairs are 
judged as evaluativelv bipolar. Consider, for 
s ‘The author would especially like to thank Ledyard 
Tucker for his patience in clarifying this problem. 
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example, the adjectives “kind” and “wise,” a 
pair evaluatively similar (both positive) and 
not reciprocally antagonistic (i.e, many wise 
people are also kind). Allowing ambivalent 
"neutral" responses, however, makes the 
scale usable by forcing subjects to transfer 
their rating from people to the net “kind minus 
wise" qualities of people. On these “net” 
qualities, of course, it is true but trivial to say 
that “kind” and “wise” are reciprocally antag- 
onistic pairs (i.e., the Y minus X characteristics 
of a concept cannot be both Y and X directed, 
though the concept itself can be). Specifically, 
the consideration of single-marking instructions 
and the defining of the neutral categorv as 
"equally X and Y" allow (force?) an individual 
to cognitively redefine each of the scale 
positions as follows: 


(—3) extremely more X (3) extremely more Y 


than Y than X 

(—2) quite more X (2) quite more Y 
than Y than X 

(—1) slightly more X (1) slightly more Y 
than Y than X 


(0) equally X and Y (of which “neither X nor Y" 
is a special case) 


Thus, the SD instructions, while intended to 
sidestep the concept of ambivalence, seem in 
actuality to have allowed it to creep in, un- 
measured, at every scale point and perhaps 
affect directional attitudinal markings as well. 
What is needed is a modification of the SD 
technique that will allow the isolation of the 
ambivalence factor in attitude. 

As a rough definition, Roger Brown's (1965) 
View of ambivalence as "mixed feelings," 
positive and negative sentiments concentrated 
on the same object will suffice, Qualification of 
ambivalence, however, requires a more precise 
working definition. A lead to this effect is pro- 
vided by J. Brown and Farber (1951) in their 
discussion of conflict theory. Basically, they 
Suggested that both the absolute and relative 
Strengths of the competing tendencies (i.e., 
approach and avoidance tendencies) are im- 
portant. Applying their reasoning to the area of 
attitudes, Scott (1966) argued that the greater 
and the more equal the opposite tendencies 
(e, the favorable and unfavorable com- 
A run) the higher is the degree of ambiv- 
E This working definition demands a 

nique that allows the Separate measure- 
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ment of the Y and X components. Such a pro- 
cedure has markedly different demands than 
that required to simply test for the bipolarity or 
reciprocal antagonism of XY pairs. The later 
procedure is exemplified by the previously ae 
single-adjective technique (Green & Goldfrie ; 
1965) and involves only the gross test for the 
presence or absence of ambivalence without 
any attempt at its quantification. 


Single-Adjective Scale Technique 


Instead of using the standard SD scale that 
is comprised of adjectival opposites bred 
good-bad), Green and Goldfried measured t ne 
conative meaning of concepts by means D 
single-adjective scales (e.g, good and hs : 
They did this by having individuals rd 
concept according to whether the scale W ss 
positively or negatively related to it. sare 
for the usual SD scale, only one check mark T d d 
allowed. Specifically, the instructions hav 
been described in detail: 


good) is 


If you feel that the descriptive adjective (e.g, the 


rs = of 
very positively related to the word at the SL t 
page, you should place your check mark as follows 


Or, if it is very negatively related, then : 


v 


t 
- T itively (but n9 
If the descriptive adjective is quite positively (b 


extremely) related to the word, then : 


$ 3 : 3 i e 
Or, if it is quite negatively related, then : 
AT : E : a 


si- 
T us ; slightly P’ 
If the descriptive adjective seems only peg word, 
tively related (but is not really neutral) to t 
then: 
wt os 


— , 7 T n: 
Or, if it is only slightly negatively related, the! 


a s " NER e 
SS eee — — — ——— " ive 

> adject 
If you consider the word to be neutral on the ad) uld 


rou sho 
scale, or not at all related to the word, then you" & 


place your check mark in the middle space 
Goldfried, 1965, p. 5). d 
anch femen 80 
As reported earlier, the criteria Or iectivo 
Goldfried used for bipolarity of XY adj 


$$ 


| 


— 9 ——ÁM ——ED 
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pairs are predicated on high negativity in corre- 
lation between the X and Y single-adjective 
responses. Green and Goldfried couch their 
Criteria in more factor-analytic language: 


If the Semantic differential scales, and therefore also 
S em mas are actually bipolar, several correla- 
= EX. actorial results should follow. For example, 
analyses of sets of single-adjective scales, each 
Constructed from the same bipolar adjectival scale, 
Should have equal but opposite loadings on the same 
ropes illustrate, consider the good-bad scale, 
ae ans been found to have a .88 loading on the 
, ^ uatlve factor (Osgood et al., 1957). If the bipolarity 
iesumption is valid, one would expect the good scale 
o have a loading of approximately .88 on the evalu- 
pes dimen ion and the bad scale to have a mirror- 
age loading—that is, —.88 [Green & Goldfried, 
1965, p. 4]. 


The rationale behind these criteria seems to 
be as follows: to the extent that XY adjectival 
Pairs are perceived as bipolar, individuals will 
tend to implicitly insert Y at the open end of 
the X single-adjective ale and X at the open 
end of the Y scale. For example, if an individual 
Pereeives good and bad as reciprocally antago- 
Diti adjectives vis-à-vis a concept, he would 
tend to treat as a bipolar good-bad scale both 
Ye ped and bad single-adjective scales (the 
itm and bad responses should be negatively 
are lated). To the extent that good and bad 
m iut perceived as reciprocally antagonistic, 
M Xdtvidual's single-adjective good and bad 

*Ponses should tend to be relatively inde- 
tendent of one another (i.e., they should be 

Orrelated), Two explanations seem reason- 
€. Either the individual might treat both 
oh Miective scales as unipolar on EST 

night their open ends with other adjectiv es he 
defin; View as reciprocally antagonistic (e.g., 
nin 


Sin 


8 evil as the opposite of good). 
tig, Sense, however, can this above bea 
lagn be viewed as capable of separ ately 
HL the individual's X and Y xem 
and S making up any overall (net) XY a 
is ty, a the amount of this ambivalence. his 
i adjectival X and Y 


h the 


ecause the single 
Uings the single 


8S are hopelessly confounded with th 

net of XY Tetra antagonism. What is 

lagnosic. a technique that allows separate 

creng 55 of the X and Y components in- 

d in any overall XY rating independent 

to fi relation of the members of the XY pairs 
S another, 


h 
Qj 
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Attitudinal Component Technique: A Modifica- 
tion of the Semantic Differential 


To briefly summarize up to this point, we 
have argued that the combination of single- 
marking instructions and double definition of 
the neutral category allow (force?) an indi- 
vidual to cognitively redefine each of the 
single SD scale positions into net (Y minus X) 
judgments. We have further argued that the 
measurement of ambivalence requires a tech- 
nique that will separate the X and Y compo- 
nents inherent in any overall (net) attitudinal = 
response. Green and Goldfried’s single-adjec- 
tive scale technique for testing bipolarity has 
been rejected as an ambivalence measurement 
on the grounds that it fails to clearly isolate 
X and Y components. Allowing double mark- 
ing on the traditional semantic differential, an 
alternative not previously discussed, can be 
rejected on these same grounds. However, it 
does suggest a modification of the SD tech- 
nique designed specifically to separate out the 
positive (Y) and negative (X) components 
inherent in any bipolar attitudinal response. 
The usual bipolar evaluative SD scales are 
presented with typical instructions (i.e., single- 
marking limitations plus dual definition of the 
neutral category). In conjunction with these 
scales, however, subjects can be independently 
presented with unipolar positive and negative 
component scales. . ; 

That the component technique is not simply 
intended to generate single-adjective scales in 
a slightly different guise (i.e., is devoted to 
actually measuring X and Y components) is 
evidenced in the instructions that emphasize 
the ignoring of opposing characteristics in 
each unipolar response. Specifically, the com- 
ponent instructions are as follows: on, the 
positive component (Y) scale (i.e., the liking 
scale), subjects are asked tomake the following 
judgment. Considering only the positive (Y) 
qualities of a concept and ignoring its nega- 
tive (X) ones, evaluate how positive (Y) its 
positive (Y) qualities are on a 4-point unipolar 


positive (Y) scale: 


“oe 1i 8 8 


ve face meaning to the notion of positive and 
nts, the bipolar scales always run 


0 as the neutral point. 


i Tog 
negative components, 
from —3 to +3 treating 
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where the categories are defined as follows: 


(0) not at all Y (zero Y) 
(1) slightly Y 

(2) quite Y 

(3) extremely Y 


On the negative component (X) scale (i.e., the 
“disliking” scale), subjects are asked to make 
the following judgment. Considering only a 
concept's negative (X) qualities and ignoring 
its positive (Y) ones, evaluate how negative 
(X) its negative (X) qualities are on a 4-point 
unipolar negative (X) scale: 
X 
-8 -2 =i 0 
where the categories are defined as follows: 


(0) notatall X (zero X) 
(—1) slightly X 

(—2) quite X 

(—3) extremely X 


Three independent measures are thus gener- 
ated by this technique: first, the usual bipolar 
attitude (4), second, its positive or “liking” 
component (4,), and third, its negative or 
“disliking” component (4,). Before we turn 
to the question of estimating ambivalence, it is 
necessary to examine the stabilitv of these 
measures and their interrelationships. Let us 
turn to the first consideration. Reliability 
coefficients for 4, Ap, and 4, have been 
obtained over a variety of "real attitude 
objects" (i.e, objects with naturally existent 
cognitive supports) for six independent samples 
ranging in size from 67 to 236. For each sample, 
correlations are calculated across subjects and 
objects. Test-retest reliabilities range from .81 
to .93 for each of the three Scores (regardless of 
the exact nature of the underlying evaluative 
scale)? significant at well beyond the .001 level 
and indicative of a high degree of stability. 
Een 
. * While this technique seems applicable to any seman- 
tic dimension, it is the evaluative dimension, of course, 


which is relevant to the question of attitudinal ambiv- 


alence. Various evaluative scales generated all the 
data Presented in 


and Raven's A scales (containing the scales good-ba 


Clean-dirty, harmful-beneficial, healthy-sick, and w 
foolish), the Eood-bad scale alone, and the like-dislike 
Scale. As the intercorrelations between each pair of 
lee Consistently exceeded -80, data deriving from 
y or all of these scales will be treated equivalently— 
that is, Simply labeled as A » Ap, or Ay. 
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The question of the interrelationships be- 
tween A, Ap and A, is more complex. Let x 
examine the 4 5—41, relationship first. Thoug ; 
these two component scores are independently 
measured, it is by no means certain that "s i 
jects are capable of making component rat s 
than single-adjective judgments . (ie, ais 
capable of ignoring negative qualities w ii 
making positive judgments, and vice E d 
In their single-adjective technique, Green e 
Goldfried (1965) offered a negative cnl 
correlation as the test for reciprocal sing 
between XY pairs (i.c., for no ambivalence a 
the case of evaluatively loaded pairs). To iem 
extent that subjects treat 4, and 41, as pe 
adjective scores, they too should be d ari n 
intercorrelated. Zero correlations, on the at <a 
hand, would indicate that A, and A, are ete 
independent component scores (i.e., that 
potential exists for ambivalence). ] T 

The results (see Table 3) are quite grati 
in this regard. For the same samples and ‘ali 
tude objects used in the calculations Qn 
abilities (again objects with namely e at 
ing cognitive Structures), the correla 8 
between A, and A, ranged from —.13 to ‘he 
(F = .05).° In other words, subjects seem ice 
capable of treating their component judge 
independently even when the D ra 
Scales are presented in close physical ien is 
to each other. Informal self-reports froni little 
jects also indicate that they are, with while 
difficulty, able to concentrate on one pole * are 
ignoring the other. That zero agen i 
obtained through the components uà ne the 
while negative correlations are obtainec id 
single-adjective technique suggests em r even 
may represent a more appropriate apii" 
the gross question as to the existence 0 tative 
alence independent of its quant 
properties. 


ving 
atti- 


measure independent components is f 
Supported through examination of t 
lying cognitive structure each of the 1 wee! 
reflects. Table 1 presents correlations DC, : 
the directly obtained liking (Ap) and obje 
(4,) components toward the attitude pase 
and the estimates of these component? ive y 
respectively, on the simple number of pos 


o 
- quf. shr! 
g 


: . TW hrou. 
* The average correlations are estimated t 
2 transformation, 


i] 
| 
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Bud negatively loaded beliefs (i.e., amount of 
Positive and negative information) elicited by 
Subjects about the object. The results indicate 
that While the number of positive (negative) 
eliefs provides a moderately good average 
s p= H, p g 01, in both cases) of 
Solute liking (disliking), it provides only a 
very weak average inverse estimate (F = — .16 
ma =-18, respectively) of absolute disliking 
(liking), ? 8 
tend question of concern involves the 
nation , erp of 1, and A, in the determi- 
dënce h 4 and the extent of the correspon- 
€ between | and its best component 
(1odiction, A noted attitude theorist, Jordan 
p 5), has criticized an oversimplified “push- 
Mu model of attitudes and suggested that 
ae and negativity may work in different 
ingui In like manner, several generations of 
io m (e.g., Bierwisch, 1967; Greenberg, 
obser: Lyons, 1963, 1968; Sapir, 1944) have 
hr is that word pairs like good and bad "en 
magi mmetrical because of differential — 
Memon They argue that good, the unmarked 
cnie er of the pair, can be neutralized in Lom 
Wher n às in *How good was the movie: 
“teas bad. the marked term, cannot (see 
probi 1969). The implication for the pons 
or t ™, of course, is differential weighting 


à variety of attitude objects for each 
il: samples described above, however, 
S a large range in the influence of both 

56) ps — .15 to 59) and An (r zim dr 
capable A. Thus, either ge. -: 
Nant of of being either the dominant dete a 
Suppre Overall attitude or, at times, oun n 
tm pent variable. In line with these results, 
“tiple regression analysis suggests both 
Corres Weighting of Ap and -ln and a high 
(47 ~Pendence between their unweighted sum 
? UM An) and A (r—.89 to 9" 
Ca Us, the independently derived -l A 
ie tentatively interpreted as the simple 


Weight 3 
"Shted net summary of the Ap and 4» 


Dents (4 = A’), A positive A indicates 
ip Verall, the concept is more liked than it 
ed, à negative A indicates the opposite, 
Zero 4 indicates that the liking anc 
Components are equal. The presen- 


score 


ng 
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TABLE 1 


AVERAGE CORRELATIONS BETWEEN ABSOLUTE 
ATTITUDE COMPONENTS AND NUMBER OF 
SUPPORTING BELIEFS 


d Ap a. 
Ant = 15" 

Positive C31” to. 5295 (—.11 to —.23) 
| —.16** "m 


| 


Negative | (—.02 to —37**) | — (32*—.51*) 
D 


are in parentheses, These correlations were 
subjects and objects for each of three inde- 
ed samples ranging in from 67 to 172 

). gnificance test for each of these correla- 
based on th pective sample size it is calculated on. 
rerage correlations, in contrast, are calculated through 
rmations and based on 311 degrees of freedom 
r, 1962, p. 140). 


Note.—Rang 
calculated aci 


tation of the bipolar SD scale, of course, 
becomes superfluous as it is possible to diagnose 
it from the component scores. In the remainder 
of this article, in any case, no distinction is 
made between A and A’ = Ap + An. 


Attitude Component Model: The Development of 
NVondirectional Attitude Indexes 


The foundation of our component technique 
laid, let us turn to the problem of developing 
an index of ambivalence and a model wherein 
such an index can be directly interpreted. To 
begin with, the demonstration of the relative 
independence of the A, and An components 
and the high correspondence between their 


unweighted sum and A, the overall attitude, 
allow us to draw A, and A, as orthogonal 
dimensions. Thus, in Figure 1, 4, is repre- 
sented by the ordinate and 4, by the abscissa 
allowing the representation of every attitude 
object by a point in the 2-space GLA g). IE 
the two components are scaled with equal 
aits, the 45-degree line corresponds to a zero 


arking on the bipolar SD scale and can be 
labeled the line of ambivalence (A =A, 
$ Ay = 0). Three nondirectional attitude 

sid ctly interpretable around 


measures become dire t 
this line: total affect (TA) 
and ambivalence (AMB). , 

Of these three indexes, TA is the most 
inclusive, referring simply to the total amount 
of affect directed toward the object regardless 
of sign. It is calculated through summing the 
absolute values of the liking and disliking 


, polarization (POL), 
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(301 (2,0) 


Aj (Disliking Component) 


daz A'- Ap* Aq 
TA - Aj* Aj] 

( POL-IAI 
AMB=TA-POL 
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(0,3) 


(łuəuodtuog Duty) dy 


Fic. 1. A geometrical representation of the attitude component model. 


components (T4 = Ap + | 4,1)? and is easily 
interpretable via the components model pre- 
sented in Figure 1, The origin (0,0) is the only 
point on the line of ambivalence wherein an 
interpretation of “no affect” or indifference is 
justified (74 = Apt |As| =0). All other 
points on this line, in contrast, indicate that 
an individual does possess both positive and 
negative affect—the farther out along the line 
the greater the affect—which happen to cancel 


For points on the line of ambivalence (i.e., 
whenever 4 — 0), TA serves as a good indi- 
cator of ambivalence (.e., the total amounts of 
liking and disliking completely cancel out). 
Clearly, however, the component model is not 
limited to the line of ambivalence. Tt is in- 
tended to describe all points in the 2-space, the 
Projection of every attitude object falling on 


7 " 
As A, = Oand 4, < 0, placing an absolute value 
around 4, is superfluous, 


some line (A = a) parallel to the line of ambiv- 
alence—a equals zero for the line of ec 
alence. When the positive component a 
attitude is greater than its negative ir pois 
A is positive and the point representing t us 
attitude will fall above the line of ambivalen is 
(the 45-degree line). When the Rope m 
true, 4 is negative and the point will fall be on 
the line of ambivalence. Perpendicular to 5 1 
family of A lines (A = a) lies the family of p^ 
lines (TA = D. Every attitude score UE ud 
described as lying on the intersection of à n 
from the TA family with a line from the d 
family. TA does not itself provide a P 
index of ambivalence Íor points not eed 2 
the line of ambivalence. "Though the ag a 
may possess both positive and negative al rd 
the nonzero net attitude indicates that the s 
components do not completely cancel x 
Thus, T4 contains two distinct segments » 
AM B or the amount of exactly counterbalan 
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iM positive and negative affect, and POL or 

E pum value of the directional residual. 

JEU align of ambivalence, POL is zero and 

— T4. For other points, however, 

AMB = TA — POL. , à 

A par words, for a neutral SD response, 

T represents the total underlying affect 

(ra), For nonneutral responses, however, 

inden represents only one segment of TA; an 

19 tee UNE is provided by the 

(POL). d alue of the directional score itself 

| Further examination of Figure 1 provides a 

Eo sung interpretation of each of these 

T n A perpendicular may be dropped from 

Y attitude point in the 2-space to the axis 

epos or abscissa) on the opposite side of 

ine of ambivalence. The length of this 

p, Dendicular from the attitude point to its 

re oe with the line of ambivalence 

3 ee an index of the POL of this point 

itself the length of the line of ambivalence 

Hes from the intersection to the origin repre- 

R S a rough index of its AMB derivative. TA, 

course, is defined as the sum of the direct line 
“tween the attitude point and the origin.’ 

s ri relationships between these indexes may 

ormally summarized as follows: 


Aw A! m As T d [1] 
TA = Apt lAa! [2] 
POL = |A| [3] 

AMB = TA — POL EJ 


Tt can be seen that POL = |Ap + Anl and 
d [4,] — [Apt Anl. Thus, if 
I 7 ~ An A and POL = Oand TA = AMB. 
T; s A, or 4,250, AMB = Q and 
the Ld A more complete delineation of 
i ys of hypothetical Ap-lu combina- 

TA, POL, and AMB is presented in 
vha Theoretically, at least, A can vary 
Aug Rt independently of 7.1, POL, and 
uus € latter three indexes, because of the 
be ewhat Nature of their derivations, must bp 
Somewhat lerrelated—specitically, TA mus 


Bro Positively rclated to both POL 


PRECES 

n 
or 4 Oe ÉL of these latter two lines, as any student 
iB ane $ already. aware, represents underestimates 


Dres, 8nd estimates 
Cseng ; T'hese estimates are used for simplicity 
ation 


dent 
Ps. 
B 


ident 


369 


TABLE 2 


HYPOTHETICAL Aj-A, COMBINATIONS AND THEIR 
EFFECTS ON OTHER ÅTTITUDINAL INDEXES 


An (Disliking) Ap (Liking) 

0 1 E $ 
0 EL 1 » $ 
TA» 0 1 2 3 
TOI: 0 1 2 3 
AMB# 0 0 o. 
ed A zal $ "Min 
TA 1 2 3 4 
POL 1 0 ES 
AMB 0 2 a ge 
=a A AS ul "c 
TA 2 3 2 È 
POL 2 1 0 1 
AMB 0 2 An 
-3 A -— —2 į 0 
TA 3 d 5 $ 
POL 3 2 1 0 

AMB 0 2 "E VA 

fs A’ = Ap + An. 
bTA = Ap + |Anl. 
e POL = 


(Al. 
4AMB = TA — POL. 


and AMB (its two contributors), while they 
themselves must be negatively related. Under 
high POL, little AMB can occur. Under high 
AMB, little POL can occur. Under moderate 
ranges, however, both AMB and POL can 
occur. The empirical question, of course, is the 
relative occurrence of 4A 5-1, combinations as 
naturally existing states and their effects on the 
interrelationships of the other indexes (e.g., 
whether POL or AMB is the more important 
contributor to TA). . I 

As previously reported, Table 3 indicates 
the relative independence. of the A, and A, 
components. As expected, TA, POL, es AMB 
are ull. relatively independent. of irectional 
a s 1) , —.07, and —.10, 
attitude (^ df = 737 in all cases), yet mod- 
1 to the absolute values of both 
71, 45, and .54, respectively; 
cases) and disliking (An) 
| “60, 54, and 35; respectively; df = ke 
o1 in all cases). It should be noted; 


iderable range in each o 


at a considera 
lations obtains across the 


respectively: 
erately relatec 
liking (Ap) € 
p< ‘01 in all 


se corre , obtai 
ix The variability se 


samples. ability $4 
function of the relative streng 


- 
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TABLE 3 


AVERAGE INTERCORRELATIONS OF ATTITUDE COMPONENT INDE 


p POL 
Measure | TA AMB è 
A —.12** —.07 —.10* 
à (—.47**, .22) (—.36**, .18) (—.48**, .24) 
1 .71** 45** ,S4** 
i (.60**, .90**) (29%, 62"*) (aate, 7095) 
1 .60** Ed iste 
ije (A41**, .80**) G3srt FT) (.08, .42**) 
TA l 35** ave 
l (.82,,75**) (.38, .64**) 
—40** 
AMB 
5 
POL 


Note.—Ranges are in parentheses. These correlations wi 
measured samples ranging in size from 67 to 236 (total N 
respective sample size it is calculated on. The average corre! 
based on 737 degrees of freedom (see McNemar, 1962, p. 140). 

* p «.05. 

r$: 01; 


An components across objects. The objects 
themselves consist of ethnic categories, insti- 
tutions, current events, and specific persons. 
Internal analysis suggests some general though 
by no means exceptionless tendencies. For 
specific people and ethnic categories, “warm is 
the norm" (little 4,); for institutions, in 
contrast, the opposite pattern seems to hold 
(little A,)—this is hardly surprising as our 
subjects are predominantly students in their 
late teens or early twenties. The remaining 
category, current events, seems to be treated 
more evenhandedly (eliciting both A , and 41, ). 
Though no one sample contains a monopoly 
on any one type of attitude object, the rela- 
tive occurrence of the various types (specifically 
ethnic categories versus institutions) does 


TABLE 4 


AVERAGE CORRELATIONS BETWEEN ATTITUDINAL 
INDEXES AND NUMBER or SUPPORTING BELIEFS 


Measure | Total number of beliefs Range 
A | .04-.07 
TA 42651 
POL | 01—14 
AMB | 02- 45* 


bjects 
red samples 
- A significance 
sed on the ie pecHve 
verage correlations, in 
meted b calculated through 7 to z transformation: 
$n, 311 degrees of freedom (see Mc 


and 
Nemar, 1962, p. 140). 


s subjects and objects for each of six i 
ch of these correlations is b: 


differ markedly across categories in a manner 
consistent with this pattern. In line with 
these results are the relative contributions © 
POL and AMB to TA (ranging from .35 2 
«15 for AMB and from .38 to .64 for POL). lt s 
true but trivial to point out that with objects 
eliciting high polarization and low ambivalence, 
POL becomes the more important contribute! 
to TA. For highly ambivalence-producins 
objects, AMB becomes the more important 
contributor. 

As the reader will no doubt recall, 
validity of the A p and 44, scores as component? 
was buttressed by examination of their "e: 
spective cognitive supports (i.e, number x 
positive and negative beliefs). In like manne ’ 
such analyses were performed on A, TA, / si 
and AMB. The theoretical point of depart 
derives from an observation by Brehm 4n 
Cohen (1962). 


the 


saries 
It is generally assumed that as a person’s attitude vm 
from neutral in either direction, he is likely te oalet 
increasing amounts of relevant information [grt 
number of beliefs] about the issue. We hasten to Poly 
out, however, that this correlation does not s A a 
hold, since it is perfectly possible for a person to flictin£ 
"neutral" position because he possesses Con 
information. [Brehm & Cohen, 1962, p. 14]. 


. the 

The implication of this suggestion pA 
components model is straightforward. Y 

number of beliefs should be directly paid e 

TA and AMB indexes, which do not canc€ ^, 
conflicting attribute evaluations, it shou!’ 


| 


AMBIVALENCE VERSUS INDIFFERENCE 


aig unrelated to -1 and POL. The re- 
» je 4 indicate that this indeed seems 
(NN i ES for naturally existing objects, 
eres ose again are variable across 
Si "e inis polarized objects tending to 
Bonner oe ‘lations of 4 and POL with 
vith T4 a Ye and Jower its correlation 
Mind ~ ice This trend is even more 
a, y nen the number of beliefs and the 
E s of their evaluative loadings are 


: control, initial results 
te thai the presentation of sets of 
attribute. arized but directionally balanced 
ates ec (varying in size from 6 to 12) gener- 
cipher hs er TA and AMB toward an abstract 
attribut aH equally sized, directionally balanced 
ih beth sets of lesser polarization (p < .01 

| tend to ases). Both sets of attributes, however, 

| tudes P near neutral directional atti- 
: : E polarizations (POL). f 

| should Tum Ication of the component technique 
gation ?e mentioned in closing. In an investi- 

of the contact hypothesis (sec Amir, 


experimental 


1969. y 
autho Heider, 1958; Newcomb, 1961), the 
r, along with several colleagues’ ob- 


taine 
results? and f, as well as «1 measures. Their 
Overall P a predicted interaction on 
Mises Ue (A) between the formality- 
o Bas an attitude object and the degree 
Subjects: ind contact with it. Specifically, 
a SIME pasta became less favorable over- 
Verall q the formal object and more favorable 
“i toward the informal object with in- 


Crea, 

dist potential for contact (decreasing 
nap ee) (E = 29,39, df = 3627, p < -001). 

Stage 9f the liking (4,) and disliking (45) 


ie, enabled us to pinpoint the locus of 
tively effects, Subjects’ liking remained rela- 
forma constant with decreasing distance to the 
w^ but tended to increase with 
? to the informal object (= 12.68, 
bui e das 001). On the other hand, 
sliking increased with nearness to 

n a object, while remaining relatively 
with decreasing distance from the 


o— 
study origin, 
dy originally reporting (hese rest 


tg, "e 
Aui bject p 
at de "d l'ormality- Informality and Resultant 
Riep Junction ont Slopes,” was prepared by the author 
DA Segan pos Ira Firestone, Melvin Kimmel, and 
Stern, Ck presentation at the 1970 
S¥chological Association. 


Neap 
dp = 
Sub 

th 


ilis, *Atti- 
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informal (F = 12.95, df = 2/627 
p < 001). d 

In the context of this particular study, the 
results tend to support a little-noted suggestion 
that neither positive nor negative feeling is 
likely to decrease with increased contact (Park 
& Burgess, 1927, p. 283). Of more general 
importance, however, is the evidence that the 
component technique allows us to distinguish 
between increasing liking (disliking) and de- 
creasing disliking (liking) as the source of 
positive (negative) attitude change. 


object 
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NOTE ON NONORTHOGONAL ANALYSIS OF VARIANCE 


ROBERT R. RA 


WLINGS, JR. 


National Institute of Mental Health, Chevy Chase, Maryland 


In this note a recent article that discussed three methods for performing non- 


orthogonal analysis of variance is critici 
obtained do not provide exact tests for 
interaction model. An alternative method 


zed. It is observed that the statistics 
main effects when one is assuming an 
is presented for treating nonorthogonal 


analysis of variance which uses an existing general linear model program. 


ck The purpose of this note is to present a criti- 
hat of a method for performing nonorthog- 
E analysis of variance (ANOVA) which was 
neds in a recent article in this journal 
ae & Spiegel, 1969) and to describe an 
NOVA for performing nonorthogonal 
First, the full rank model, as discussed in 
E and Smith (1966, pp. 69-70, 74-77), is 
ed: 


yr xen gu 
Where NT 


XP, E(e) = 0, E(ee") = o'I. 


Now divi P 
m V divide Y into / sets of columns (a1, : x0), 
make the corresponding partition of 


Bi 
B3 k 
1 &J 
xn = 0 for all i j, which implies that 


is diagonal in blocks, then we find that 


t 
OTXTY = Y bT Y 
ici 


Where po. : 
ve b; is the least squares estimate of Bi 


R residual sum of squares is given by 
of * ^ Y"Y—b"XTY, Now consider the test 


the SP Othesis 5, = 0, Under the hypothesis, 


Sum of squares is given by 


t 
SSy = YTY — Y, bx Y. 
ici 


ae differer "c ig "oT 3 " 

nce SSy — SSg = bitxi Y is the 
that 9! squares due to the hypothesis. Recall 
Computed a multiple correlation can be 
i bTXTY — nf? 


R? = v2 
FTF — nY”? 


Es 


om 
R 
R, Re 
aai for reprints should be sent to Robert R- 
Cp Puter Jr., National Institute of Mental Health, 
cvy cp Systems Branch, 5454 Wisconsin Avenue, 
ase, Maryland 20015. 
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where Y7TY —nY?=SSp = total sum of 
squares adjusted for the mean. Under the 
hypothesis 5; = 0, we obtain 


Thus we can obtain SS; = SS7(R? — Ru’). 
It should be stressed that, assuming NE 
orthogonal interaction model, we cannot obtain 
sums of squares for main effects in this manner 
since orthogonality conditions are not satis- 
fied. For a discussion of the tests for main 
effects in the;nonorthogonalinteraction model, 
refer to Schefié (1959, pp. 116-119). Although 
Methods II and III discussed in Overall and 
Spiegel do provide tests of hypotheses, these 
tests are not tests of main effects. In their 
article, Method I corresponds to the exact 
method given in Schefié for the interaction 


model. Using these methods, one first reduces 
the original design matrix to a full rank matrix 


bv making the usual ANOVA assumptions 
and redefining parameters. One then uses the 
bove described procedure to obtain sums of 
squares for hypotheses. It is demonstrated by 
means of an example that this procedure need 
not result in the correct sums of squares for a 
nonorthogonal design. . 

Let us consider the following design where 
sake of brevity we have not indicated 


al 


for the 
the interaction terms. 
ri i $ 0 9 
11020 0 
| 1001 O;fn 
yi hi i49 8 1| |a H 
"óuerreoepe[ 
H 10 1 1 0 Oji 5 
à 101 1 0 Of | ds 
10 10 1 OF (bs 
101001 


y = wx + daxe + +++ + barg +e. 


Now make the usual ANOVA assumptions 
d» = — a, bs = — bs — by and substitute these 
values for a» and 5; into the above equation: 


y=aitar(v2—23) +01 (34 — xs) J- bo (x5 xo) Fe. 


We now have the full rank model y = Za + e, 


T d led 
lh. $ iei 
1 L-i 1]|fu 
yı 1 1-1-1] /a, [71 
HE 1.— i (| 
y» i-i 11] b £y 
iei 4 41 
1—1-1 1| 
=i =1 = 
Since we have 
o= is 
maj- 9-4 1 
L= 9-5 
Ls 1-5 © 


it is seen that orthogonality conditions are 
not satisfied, so that the use of the above 
procedure does not result in exact statistics, 

A procedure for performing nonorthogonal 

ANOVA, which uses existing general linear 
hypothesis programs, is now described. The 
following method, which was discussed in 
Scheffé (1959, pp. 112-116), is suggested: 
(a) test for lack of fit for the noninteraction 
model; (b) if there is no lack of fit, then test 
for main effects, 
, The lack-of-fit test that is now discussed 
is described in Scheffé, but in a different man- 
ner. Consider the model Y — XB + e where 
this is a less than full rank model: 

Let ya; denote the jh repeated: observation 


Yor wl i, where VS Aye ww y = Xu 
‘Let 5; denote ‘the sample mean for cel i. Now 


define the total sum of squares for pure error. 


ni 


: 
SSg — DL (yu — 9)? 


i=l j=1 


with 


k 
df 2 ng = Y (n; — 1). 
Ei 


ROBERT R. RAWLINGS, JR. 


The sum of squares for lack of fit is given by 
SSir = YTY — bTXTY — SS, with df=" 
— y — ng where y is the rank of X, and bis 
any solution of the normal equations for the 
model. Then the statistic 


S5; RIN — yo Ate 
SSp^ng 


has an F distribution with n — y — nz and ne | 
degrees of freedom. 

This test is described by Schelié at d 
for interactions, but it actually is a lack-of-h | 
test as used in regression analysis. If we aa | 
consider the reparameterized full rank mot . 
V = Za + e, we can use the results warner 
in Graybill (1961, pp. 138-139) to pio 
tests for main effects. The program — P 
(Dixon, 1968, pp. 543-557) can be ud ! 
obtain the proper sums of squares by nt 
method. It should be recalled at this D 
that PTY — jrXTY = yry —a7Z'Y where 
a is the least squares estimate of a. If we ei 
the design matrix Z in the above programy e . 
can obtain a. By specifying the proper hypo “eS 
ses, we can obtain the correct sums of d | 
for tests of main effects. For clarity, ne | 
consider the previous example. We m si 
for an A effect by specifying the hypo t 
a, = 0. Likewise, we can test for a B eT oat 
specifying the hypothesis bı = b: = 0. fact 
these tests are correct follows from oe i 
that the hypotheses are true if and 0? ‘hen 
4, = ds and bı = by = by, respectively, a 
the usual ANOVA assumptions are made. | 


is a test 
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range from .01 to .99. 


s this point in the history of psychology 
* methodology of signal detection theory 
E become established as a preferred tech- 
en for assessing a subject's ability to dis- 
3» "nate the occurrence of discrete binary 
ents. The primary argument for its use has 
x the provision for a discrimination index 
( ) which is independent of response bias 
8) factors. Bias refers to the fact that, inde- 
Ne of the stimulus, not all responses are 
| E y likely, and as such bias should not be 
E. i dM with d’, While a number of au- 
fem (e.g., Green & Swets, 1966; Swets, 1964) 
| ap A elaborated the rationale of the theory, 
de labe tools for the application of signal 
ee theory methods are not as readily 
bs able. It is the purpose of this article to 
Sent one such tool that can be used for 
© calculation of d’ and f. 

Rode ee to the signal detection theory 
need b (e.g., Swets, 1964), only two values 
€ observed to obtain estimates of the d" 
xx Cw Parameters. These are the hit rate (HR) 
se UD, the proportion of signals pre- 
*d that are affirmed by the subject, and 
toe alarm rate (FAR) = Px(A); the 
When Ton of times that a signal is reported 
hi, 0 signal was actually presented. The 
M to Tate and false alarm rate are transformed 

and 8 by the following formulas: 


Y = wem aeran, UM 


SS 


— E= ORD(BR)/ORD( NR. 
iq. ———— 
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| A TABLE FOR THE CALCULATION OF d’ AND §' 


LARRY HOCHHAUS ? 


Iowa State University 


A compact table is presented which will permit precise calculation of signal 
detection parameters, d' and £, for hit and false alarm rate combinations in the 


where ABS(») and ORD(?) are the abscissa 
and ordinate values of the standardized nor- 
mal distribution, respectively, which are 
given in Table 1 for an orderly set of p values 
ranging from .50 to .99. Table 1 has been 
abstracted from Pearson (1931) and gives 
abscissa and ordinate values accurate to 5 and 
10 decimal places, respectively. For values of 
p <.50 the following transformation should 


be used: 

ABS(p) = —ABS(1 — p) 
and 

ORD(p) = ORD(1 — f). 


Thus Table 1 provides a large scope and 
permits the precise calculation of d' and 8 


TABLE 1 


NORMAL DISTRIBUTION FUNCTIONS FOR 
PERCENTILES OF AREA 


Lu ABS" ORD* ? ORD*« 


.30804 22804 
30881 69424 | 


“00000 


51777 65727 
«31087 31933 
30364 80841 
609 35838 
820 12820 
.27996 19204 
127136 52755 
| :26240 00175 
| 125305 35384 
| 224331 17408 


115097 
117637 | 


23315 87758 
22257 67101 
21154 51092 


34230 AOS Æ 
A3662 33449 7 x 
33064 55199 | -98 | 2.0527 
"33436 52180 | .99 | 2.32688 
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Tables for Statisti- 


131 5 Public Health Service Researc sine ipsi 

Wet cle sid j M eg == i ssion from 

i» Wayom the National Institute of Menta Note Reprinted ie by Karl Pearson, Copyright by the 
) y 


; Press, 1931 


jans amd ' 
í n a dichotomiz 


E idge University 
‘ ert larger prop 
mal distribut 
bABS =t 
Ai the stand 


portion ii ed standard nor- 
0) 


an to the point of dichot- 


ie mes j 
i) gative for values 


1 distribution (ne! 


he point of dichotomy in the 


ion. 
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ard normal 


< .50). P f 
- rdinate at 
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LARRY HOCHHAUS 
TABLE 2 
ILLUSTRATIVE EXAMPLES OF THE CALCULATION OF d' AND 8 : 
Subject HR FAR ABS(HR) ABS(FAR) | a ORD(HR) | ORD(FAR) | 8 
if 85 -70 1.03643 .52440 | 3912 .23315'87753 | .34760 26142 | on 
2 .60 40 .25335 | — .25335 .507 .38634 25335 | .38634 25335 108 
3 | 35 4 5 | — .52440 23315 87753 | 1.490 
| 


| — 1.03643 


| 4912 | .34769 26142 
i 


Note.—HR = hit rate; FAR = false alarm rate; ABS = abscissa; ORD = ordinate. 


for many typical combinations of hit rate 
and false alarm rate; should the researcher 
require more precise gradations of hit rate 
and false alarm rate, original tables of the 
normal distribution should be consulted 
(Kelley, 1938; Pearson, 1931). To illustrate 
the use of the tabulated values, three example 
hit rate and false alarm rate combinations 
have been chosen, and the steps for calculat- 
ing d' and f are shown in Table 2. 
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OF ACHIEVEMENT MOTIVA? ZON 3 ne 


| Bure 
bility of fantasy-based achievement 
s for such measures are suggested, 
ented. Analysis of both pub- 
reliability of these 
motive measures to 
nds of low reliability of the 
table in terms of failure 


outcome. In short, measures of motivation are 


required. 

The need for measures of 
vation has seemed particularly cogent because 
“achievement motivation” captures what both 
the academician and the man on the street 
see as important ingredients for Success 7 
persistent striving, toward a goal, overcoming 
of obstacles, identification with career goals 
rather than personal or affectional goals, will- 
ingness to work hard. Unfortunately, need for 
a commodity does not necessarily guarantee 
the feasibility or even the possibility of pro- 
t commodity. The literature re- 
scrutiny, serious misgivings on 
many investigators about the via- 
hievement motivation T a em 

i n empirically rep ica- 
retical em - at l Pall, Katkov- 
po pi 1960; Jensen, 1959; Katz, 
SK; "S à 

"7. Klinger Mitchell, 1961; M. 
D po " solomon, 1968; Musee 
1969). Without commenting directly on views 


thors, in this article I 

“essed by these au ] 
pam at fantasy-based measures of 
gg 


achievement motivation have psychometric 


lv sufficient in them- 

ili that are probab 3 
er nany weaknesses noted 
selves 


lead to the n 
; others. 
by at of achievement m 


Many tests i 
ently used in the United States. 
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i) 
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They di- 
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vide roughly into “objective,” multiple- 
choice, short answer, and the like, and “pro- 
jective," including fantasy based. The weak- 
nesses others have found to be associated 
with objective tests are briefly noted. The 
weaknesses associated with fantasy-based 
measures are the main topic of this report, 
and it is suggested that these weaknesses have 
serious consequences for the prediction of 
school performance, a use to which fantasy- 
based measures have been increasingly put, 
especially in trying to upgrade the education 
of disadvantaged or minority group children. 

The most serious criticisms of objective 
tests are raised in the context of their format: 
even though the tests are of satisfactory reli- 
ability, subjects may be unaware of, or unable 
to report on, their motivational states. This 
issue is one that will not be resolved soon, 
and is not central to the concerns of this arti- 
Cle. It is not discussed, therefore, but even if 
it could be settled, objective tests are suspect, 
in that they seem to correlate negligibly with 
one another although purportedly measuring 
similar things, and they have been shown 
(Buxton, 1967) to be highly susceptible to 
“faking,” 

Fantasy-based measures of need achieve- 
ment consist of sets of stories written to 
(usually four) Thematic Apperception Test 
(TAT) cards. The Stories are written by sub- 
jects in a standard time and are content ana- 
lyzed according to rules enunciated by Atkin- 
son (1958), or rules derived closely therefrom. 
The remainder of this article is directed to- 
ward documenting the low reliability of fan- 
lasy-based measures of need achievement. 

In spite of a voluminous literature, the 
reliability of fantasy-based need achievement 
tests has received scant notice, except in terms 
of interscorer agreement. Using data from a 
large Survey of ninth graders (Entwisle & 


Greenberger) ,* coefficients of internal con- 
sistency were found to be about .30. This 
finding 


Prompted further investigation. As the 
reader will see, study of both published and 
unpublished data of other workers suggests 


*D. R. Entwisle and E 


T - Greenberger, A survey of 
cognitive 3 


i style in Maryland ninth-graders: I, 
levement motivation, Productivity, Baltimore 
d.: Center fi f 


or Social Organization of Schools, The 
Johns Hopkins University, 1970. B 
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that low reliability may be typical of fantasy- 
based measures and that some of the perplex- 
ing findings in the need achievement literature 
can be resolved in light of the low reliability. 

The rest of this article has the following 
plan: (a) general consideration of measures 
of reliability and their application to fantasy- 
based tests; (5) reliability analyses of several 
bodies of empirical data; (c) interpretation 
of the achievement motivation literature = 
light of findings presented in this paper; anc 
(d) discussion and implications. 


MEASURING RELIABILITY 


In its broadest sense, reliability defines the 
generalizability of a test score (see Hw 
Cronbach, & Rajaratnan, 1965). A subjec 
test score is a sample from a universe A 
possible observations of his performance ji 
the trait or characteristic being uns 
Every test score involves variable aspects e 
specified in the operational definition of E 
procedure. In this sense “unwanted variane s 
comes from many sources, and each definitio" 
of errors changes the definition of the relia 
bility coefficient. waved 

For fantasy-based measures of d 
ment motivation the following incomplete. ei 
suggests some sources of "unwanted. bag 
ance”: variation from one test occasion 
another, variations from one scorer tO 
other, and variation from one version O ic 
instrument to another (different sets of de 
tures)—the most basic because it is d 
tually prior to the others. Only if test ane 
geneity is sufficient is it worthwhile to tending 
gate other facets of reliability by anie 4 
generalizability and, after that, to searc 
validity information. liabil- 

This brief rationale suggests why nered 
ity estimates can be computed from a i 
starting points and why the pue div 
chosen depends on the purpose at han 4 ge 
narily, as mentioned, a limited view Jy 
eralizability is adopted, and reliabi of the 
fantasy-based tests is estimated in one 
following ways. 


an- 
fan 


on 
: X oy sons 
Testing and Retesting the Same Per 
Two Occasions ent? 


spor es 
For fantasy measures where pag ture 
write “imaginative” stories based © 
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and each Story typically takes about 4 min- 
Utes to write, testing and retesting the same 
Individuals with the same pictures has- draw- 
ücks, One would expect very specific mem- 
ory and carry-over effects from one occasion 
to the next, When it has been tried it has led 
0 low correlations, perhaps because respond- 
ents fee] they should do something “different,” 
although, for all we know, specific memory 


Ci 3 " š . 
etry Over could lead to spuriously high cor- 
relations, 


Split-Halj and Equivalent Forms 


methods have been used occasion- 
e fantasy-based measures, The num- 
isli , Pictures to which subjects can respond 
„mited, SIX probably being an upper limit. 
55 limitation seriously hampers the split- 
aPproach, The development of equiva- 
esting is tedious, and, so far at least, 
t S obtained this way are low. Most of 
casing valent-forms estimates also include 
orms aj "Occasion variance because the 
E * administered at different times. 
Don cena Y Estimates (Internal 
Ststency) 
Prior 
"ating 
Mer lar Ways used here has apparently 
Use ; ned. There seems no reason not to 


S€ ite ils T x 
Mm statistics to estimate internal-con- 
reliability, 


+ aT : a m" H 
vi em. As is seen, reliabilities estimated 


Sstimat represent an upper bound 
a i $ 
ally, 1968). using other approaches (Nun- 


ReLIaBLITY Usine “ITem” 


ISTI 
CS or Fantasy-Basep MEASURES 


€ te ! 

NIS an homogeneity approach is based on 

Ke Well, ons of reliability like that of 
U 


th der. S ie Hoyt (1941) procedure, the 

the Dha ined (1937) (20) formula, or 

k al is efficient (Cronbach, 1951). The 

one’s Can " between persons produced by 

EF ?* broken down into two parts: 
Consist, 


S of variance specific to the 
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separate items; another part consists of co- 
variance between items. Basically, for a mea- 
sure to have satisfactory reliability, the co- 
variance portion must comprise an appreciable 
fraction of the total variance. Put another 
way, if each item measures only something 
unique, it is illogical to add scores from items 
and obtain a “total score." Several ways of 
grouping or relating the terms in this defini- 
tion are possible, and as Scott (1960) showed 
they are all closely related to one another. 
Details of analyses of variance to estimate 
reliabilities, with numerical examples, are 
nicely summarized in Winer (1962, pp. 124- 
132). Cronbach's alpha coeffcient and the 
variance. component estimate, although de- 
rived differently, are equivalent (see Stanley, 
1970). If items are scored dichotomously, the 
Kuder-Richardson Formula 20 is appropriate, 
and is equivalent to Cronbach's alpha for 
continuous scores. 

In scoring need achievement stories (see 
Atkinson, 1958) each story is rated initially 
as —1 (unrelated imagery), O (doubtful or 
task imagery), or +1 (achievement imagery). 
Only if achievement imagery (+1) is scored 
can further scores be assigned on a dichoto- 
mous (0-1) basis. Eleven further decisions 
are possible, so total scores range from —1 to 
+11. The number of decisions on which a 
picture score is based need not be the same 
from person to person. Thus some persons 
may have scores based on only 3 decisions, 
other persons may have scores based on 8, 
10, or even 12 decisions. This is tantamount 
to different people taking different tests if 
individual decisions are defined as "items." 
The strategy taken in this article is to rede- 
fine "item"—to let the pictures be the items. 
Pictures are defined as items in estimating 
reliabilities for several sets of empirical data, 


RELIABILITY ESTIMATES FROM 
EMPIRICAL DATA 


Ninth-Grade Survey 

In a large survey of several cognitive style 
variables (see Footnote 3) a need achieve- 
ment fantasy-based measure was included. 
Four pictures especially developed were given 
to 665 ninth graders. Interscorer checks made 
on 100 sets of four pictures demonstrated 
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TABLE 1 


RELIABILITY (INTERNAL Co; 


TENCY) ESTIMATES 


rok Nintu-Grape Survey Dara? 


Girls Boys 
Group Io» | K-R 20 estimate | Cronbach alpha K-R 20 ial | , rental alpha 
n | n 
Dichotomous Full-scale Dichotomous 
scores scores scores 

Inner city f 
Black Low 30 + zs 30 .06 š 
Medium | 41 | 27 21 29 «0 P 

White Medium 16 27 27 16 .29 es 
Blue collar j8 
Black Low 22 38 Evi 21 16 E 
Medium 30 <0 <0 25 E ET 
White Medium 30 .09 23 30 ERI T 
High 30 .56 32 19 A9 3 
Rural white Medium 28 <0 <0 29 2 ic 
High 30 33 E 20 .56 4 
Middle-class white | Medium 20 AS of 22 AO. "a 
High 30 0 E 30 .60 . 33 

Mean 127 27 32 ae 
Initial trial white . T 
group Medium 30 54 7 21 68 E 
High 16 56 60 20 60 |t 
Total 353 312 


tried and then the b 
a D. R, Entwisl Greenberg: 
ductivity. Baltimore, Md, : Center for Social Org 
^ "Medium IQ" students have IQs in the rang 
"Low IQ" students have IQs in the range 70-85. 
the 92nd percentile on national norms. 


our for each s 


ion of School 
-114 or SI 


interscorer agreement of 92%. There are 13 
Strata of each sex, ranging in size from 16 to 
41 individuals (see Table 1), and strata vary 
on social class, race, sex, IQ, and other demo- 
graphic variables. From a matrix of scores 
for individual pictures, Cronbach's alpha val- 
ues can be estimated (see Table 1). For girls, 
the average homogeneity (alpha estimate) is 
27, and for boys it is .33. The strata in the 
last two rows of the table are not included in 
the average because pictures were discarded 
or retained after inspecting the data for this 
sample. 

Tt turns out that the intercorrelations (six 


possible) between scores on each of the four 
Pictures within sa 


a mple strata have average 


ues ranging from .05 to .16 for girls and 
from —.01 to .19 for boys. These correlations 
appear very small, even to casual inspection 
When one considers that ; 
represents 25%, 
as shown elsew| 


each picture score 
of a fairly long test. In fact, 
here (see Footnote 3) these 


of cognitive style in Maryland ninth graders: 


AT scores 


n. 
AL norme 
C l' aci e es ab 
ligh IQ" students have [Qs in the range 128 or higher or SC D » 
e 
" r tb 
correlations would have to exceed .4 fO 
^ 
i 
| 


tasti ro* 
achievement motivation: p 

, The Johns Hopkins University, 1970. " 
»etween 39th and 60th percentile on 


test to have reliability in excess of 1. Er, 
tests have many items, smallness of inter! 

correlations can be tolerated because idly 
number of covariance terms increases ray it 
with each additional test item. Here, 
four items, reliability is bound to be low 


low interitem correlations. 


with 


inso 
Reanalysis oj Data of McClelland, Athi 


Clark, and Lowell (1953) 


Some data for a Latin square desig" 
presented by McClelland et al. (1953; " ws)? 
for eight four-man groups of subjects re the 
where eight pictures (A through H) ES 
Latin square factor, and eight ord 
tions of these pictures are used 


ganged "29 
This published matrix can be — foul 


are 


order of presentation neglected, A 
man groups are now the row facto! 4 acto" 

tures (A through H) are the colum? ctu | 
Since the matrix now gives scores for I | 


p the group composition is the same 
P TON; one can compute interpicture 
i ns based on mean scores for four- 
Subject groups (lower half of Table 2). 
BE duin between means will be attenu- 
Feat to correlations of the original 
a E probably by about 50%, but 
m à Hs of 28 correlations, almost half 
E ja ive, The average interpicture corre- 
"method 1 fas. Following Guilford's (1954) 
1 average lor estimating test reliability from the 
448€ Intercorrelation, one estimates a reli- 


abili 5 
iiM value of .22. Assuming an average 
item correlation of .13, to compensate for 
] 


e 1 i4 ae . 
of p nuation, leads to a reliability estimate 


Other Unpublished Data 


a? data obtained recently from other 

I lasy. eie indicate low reliability for fan- 
ï Ae i achievement scores. 

could ma D to select young women who 

aining pum advanced dental technician 

j , ilzenrath (personal. communica- 

d July 1969) included six need achieve- 

zieküres in a large test battery. He 

jj, P "erously made available individual 


ur 
^ dey; © scores (7 = 98). The means, standard 


ati 2 : S i 
y ons, and interpicture correlations are 
TABLE 2 
MEAN PICTURE SCORES 
UTTM " Picture 
"n 


6.00 


"rbieture 
"re correlations based on mean scores for 


A-subject groups 


Nata ane n) ee€ 
are from McClelland et al. (1953). 
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TABLE 3 


Summary or Item Statistics (Unpublished 
Data from Other Workers) 
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Intercorrelations 


Picture | ytean | SD 
gem qos 


no. Picture no. 


1 
2 13 06.09 
3 | 26 04 19 
4 1 —.06 .06 
5 OL .06 
6 00 
Total 4.55 | alpha value = .33 
Alschuler and Hann data" 
| m 
1 1.62 | 1.17 
2 2.29 | 1.61 I1 11 04 O3 .04 
3 1.80 | 1.60 15 AS 14 .11 
4 21t | 142 442 18 .05 
5 4.00 | 2.25 12) .12 
6 1.71 | 144 .16 
Total | 13.53 | 5.11 | alpha value = .43 
I | 


‘onal communication, 1969, 

Hann. The motivational impact of in- 

ington Junior High School, 
e of ication Working Paper No. 

1747) Cambridge: Harvard University 

cation, 1969. 


shown in the upper half of Table 3. These 
item statistics lead to an alpha estimate of 
31. This study was carried out using dif- 
ferent pictures from the other studies re- 
viewed here, was scored by experts, and 
perhaps more important, used a longer test 
(six pictures) than studies reviewed above. 

Alschuler and Hann,* in devising training 
procedures to increase need achievement in 
disadvantaged youngsters, obtained scores on 
need achievement for 108 boys and girls in 
the sixth and seventh grades in Duluth, Min- 
nesota. Item statistics are presented in the 
lower half of Table 3. The alpha reliability 
turns out to be .43. 

Estimates of alpha coefficients from three 


GENE M 

4A. Alschuler and M. Hann. The motivational 
act of individualized instruction at Washington 
Junior High School, Duluth, Minnesota. (Office of 
Education Working Paper No. 10, Contract 
0-8-071231-1747) Cambridge: Harvard University 
Graduate School of Education, 1969. A. Alschuler 
kindly made available some of the data on indi- 
vidual pictures secured for the Alschuler and Hann 


report. 
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different sets of data for which item statistics 
could be obtained are thus consistent with 
values seen in the Entwisle and Greenberger 
(see Footnote 3) survey. The latter two sets 
of data, with more pictures (six rather than 
four), do not suggest that large increases in 
reliability are likely to occur with feasible 
increases in test length. 


Use of a Dichotomous Scoring Scheme 


There is a marked resemblance between 
the distributions of examinees when a dichoto- 
mous rather than a full-scale scoring scheme 
is used to derive achievement motive scores. 
To be specific, if each picture in the ninth- 
grade survey (see Footnote 3) is scored 
dichotomously, O or 1, depending only on 
whether achievement imagery is present or 
absent, and the remaining scoring categories 
are ignored, there are high correlations be- 
tween the two sets of scores (.88 for all 
black students, .90 for all students of average 
IQ combining on race, and .92 for all white 
students). 

When a dichotomous scoring procedure is 
used for Hilzenrath's and Alschuler and 
Hann's data, and the dichotomous scores 
then compared with full-scale scores, the cor- 
relation is also high: .88 for Hilzenrath's 
data (» — 98) and .89 for Alschuler and 
Hann's data (n = 108). Data obtained much 
earlier also confirm this relationship, for 
Ricciuti® found high correlations between 
total achievement scores and scores on single 
categories for a 12-picture test (achievement 
Imagery, .92; achievement theme, .86) for 
147 high school juniors. Also Ricciuti and 
Sadacca? found high correlations (.90 and 
-70) between the achievement imagery and 
theme categories together and total scores 
or two groups (n = 53; m = 79) 
different nine 
of this fi 


1 taking two 
ne-picture tests. The consistency 
nding in three independent sets of 


? H. N. Ricciuti, The prediction of 
with a projective test of achievemen 
Initial validation studies, 
Contract Nonr-694 (00 
"iD PE Service, 1954. 
<A. Ricciuti and R, 

academic grades with ren 
ment motivation: Il: Cross- 
School level, (ONR Contract 
on, N. Ju: Educational Test 


academic grades 
t motivation: I. 
(Tech. Rep. No. 1, ONR 
)) Princeton, N. J.: Educa- 


a. The prediction of 
ve test of achieve- 
validation at the high 
Nonr-694(00)) Prince- 
ing Service, 1955, 
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modern data, plus the earlier findings of 
Ricciuti (see Footnote 3) and Ricciuti and 
Sadacca (see Footnote 6) suggests that equiv- 
alence of dichotomous and full-scale scores for 
fantasy-based achievement motive measures 
is a widespread phenomenon. k 
Exactly what does this imply? It implies 
first that defining a picture as an item to 
permit reliability estimates of internal con- 
sistency, as done here, is probably sensible. 
More important, it suggests comparing relia- 
bilities based on dichotomous scores with 
reliabilities based on full-scale scores (5€ 
Table 1). It turns out that the two sets ° 
reliabilities in the ninth-grade survey com 
pare very favorably, even for those se 
where reliability based on full-scale scores, a 
relatively high. In fact, the average sof 
ties are just about identical (.27 for girls for 
both scoring methods, and .32 and E 
boys, see Table 1). Similar computations e 
Hilzenrath's data yield .33 and for Alschu a 
and Hann’s data. yield .48, in both a 
higher than reliabilities based on full-sc@ 
scores. " 
The effort involved in obtaining the E 
scores is much greater, of course. The sil 
culty of scoring a story in the traditional i 
ner depends on the story itself, but ite 
time should be greatly reduced if -— 
mous rather than full-scale scores are 9 
Since validity is limited by reliability i- 
more validity appears attainable with yit 
scale than with dichotomous scores. I" jen 
of lengthy writing of stories by respon ing 
and elaborate and time-consuming | med 
then, it turns out that the fantasy-base© ^ qe. 
sures are equivalent to one dichoto™ or 9 
cision per picture. With usually € jabi 
pictures, it is not surprising to note ps ica” 
ties in the .30 range. The practical L (2 
tion is that the “rich” data providet pro 


r jt t0 e 
story of perhaps 300 words turn 0 om 


; ; jsioT: 
vide the basis for only one dec 


piece of information. Furthermor? rte B 
pictures are added there is no guaral "i. d 
alpha values will increase un 
underlying common factor. 
providing more pictures is prol the 5 ute 
because even with four pictures " pond? 
so tedious that it is difficult to get resi 

to cooperate. 
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he initial studies of the need achievement 
measure, reported in McClelland et al. 
(1953), make little mention of reliability 
determinations. This trend has continued, 
and most of the need achievement literature 
Slves the percentage of interscorer agreement 
as the only reliability information (excep- 
tions are noted in detail later). Clearly, al- 
t ough many disagreements between scorers 
Impair reliability, high interscorer agreement 
'S only one requisite for high reliability of 
3 test. Two scorers could agree with one 
berfectly in assigning scores for each 
Te, but if the scores are uncorrelated 
i One picture to the next, the total score 
Made up of a series of unrelated numbers. 

9 take a concrete example, it is as though 


One hz < i 
* had a test like the following: 
3 How old is your father? 
- How much did you pay for your last 
car? 
3. As a child, how many times a week 
" Were you spanked? 
* How much sleep (in hours) do you 


Average each night? 


iuh "Min would yield unrelated, or at 
ligi UH related, answers. There is little 
Score th that scorers could be trained to 
qn em in the same fashion. One would 
Such Q * however, to add scores from four 
lota festa and then seek to correlate 
a lam with other variables, because cor- 
fa ie would depend mostly on specific 
Ous fà at the four questions. This is analo- 
Ach leve, What is apparently occurring in need 
be hes oe measurement when correlations 
P mem tee are close to zero, but when 
M any Jetween scorers is high. o. 
me Sure Workers in the field of motivation 
Not und Pie: feel that important motives are 
Person voluntary control and, in fact, that 
"Dor may be unable or even unwilling to 
the 9n his own motivational states. Thus, 


esi Measures obtained through direct 
lr * H " 
Ad-Den ng, forced choice, or by other paper- 


Cl ý m 
What is tests, have questionable validity— 
tribute ing measured is some superficial 

» Perhaps only a response set. There 


seems to be a basic misunderstanding, how- 
ever, for the next conclusion is often that 
because the high reliability that is character- 
istic of paper-and-pencil tests is, in their 
opinion, preferred to measures of doubtful 
validity, it is “better” to have measures with 
low reliability that are valid in the sense of 
measuring a theoretically relevant or pre- 
dictively relevant variable. The whole issue 
hangs, of course, on how low the reliability is, 
and for what purposes the testing is under- 
taken. 

In many present day applications, achieve- 
ment motive scores are desired to assess 
training programs (like Alschuler & Hann’s 
data; see Footnote 4), or to explore relations 
with other subject variables (like Hilzenrath’s 
data), or to investigate differences between 
subcultural groups (like Entwisle & Green- 
berger’s data; see Footnote 3). For the first 
two uses, which involve correlational studies, 
low reliability is an almost insurmountabl 
problem. For the third use, apparently even 
reliabilities in the .30 range are hard to at- 
tain for low-status respondents (see Table 1). 
Here too in order to explore causal relation- 
ships between social class and school be- 
havior (often the underlying purpose in such 
research), higher reliability than that needed 
to establish differences between means is 
probably required if the results are to be 
used in suggesting interventions. A review of 
the literature (Table 4) suggests that occa- 
sionally reliability coefficients as high as .50 
are found using test-retest or alternate forms, 
but the average reliability for fantasy-based 
measures is probably about .30. This implies 
that it will be very difficult, if possible at all, 
to discern relationships with other variables, 

The major published evidence on reliabil- 
ity of fantasy-based achievement-motivation 
measures contained in the periodical litera- 
ture is summarized in Table 4. Generally, 
published reliability estimates are below .50 
with the exception of some questionable esti- 
mates (see footnotes to Table 4). Morgan's 
data based on 12 pictures are the only real 
exception. Some of this evidence needs further 
explanation and is discussed below. 

The evidence presented on reliability of 
the fantasy measure in the basic source book 
for the instrument. (McClelland et al., 1953, 


a 
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TABLE 4 
SUMMARY or PUBLISHED RELIABILITY ESTIMATES FOR FANTASY-BasEp ACHIEVEMENT MOTIVE SCORE! 
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E | 4 Number | geliabilit 
Number| pa gu. : | Sample à Reliability 
Study E WM Wr Study a m 
Test-retest: long term Split half 
Kagan & Moss (1959) , AEn S Atkinson nai | 3 
3-year interval 86 7 32) (1958) 2 28 
á 22 j^ 48 S | nx 
16) E $ | 488 
Moss & Kagan (1961) I j 27 
10-year interval 71 3 331» Weinstein (1969) TUB å E. 
Test-retest: short term Equivalent forms 
" " ge 
Krumboltz & Farquhar Birney (1959) Ru 4 A 
(1957) Haber & Alpert (1958) zi 
9-week interval 169 | 4 -26 3-weck interval 26 4 PE. 
| 21 d 
pu Heinemann Klinger (1968) j " 
93, sweiki T 44 4 
1-month interval 40 4 |a Lopes ferat i mi 
.10. 
Split half McClelland et al. (1953) 48 
1-week interval 32 : P 
Child, Frank, & Storm e p 22i 
(1956) 150 4 | .27* V » 
8 A3 Morgan (1953) 56 
Lindzey & Herman | | S-week interval no 1 EN 
(1955) 148 | 8 |.I9or.544 .64. 
| 


a The same 
spectively, 
b Ss were tested at 
$ All possible split I 
3 The larger estima! 


ased on varying Ns. 
* The sam 1 
2, 


L £ Ss responded to four cards at intervals of 1 week for thre 
Week 1-2, Wi " f 


! Alternate forms administered 1 w 
E ree 
in each g 


ek apart, 
separate alternate forms tes 
roup varied, and it is impossit 


PP. 185-217) is neither extensive nor en- 


couraging. A reanalysis of some of McClel- 
land et al.’s da 


internal consis 
would have be 
The reliability 
are also low, A 
of a need achi 
188) that use 
32 persons, T 


able sortin 
8 ai i 
three-picture A two equivalent 


ests (using six of the eight 


e Ss responded at ages 8-9, 11-6, and 14-6. Phi coefficients 


The value given is for a contingency coefficient, 
"s alpha. 

rman-Brown formula for a test of 20 picture 
e estimate of score stability Cp. 267]" out of 


examination of scores used in obtaining four-picture estimate. 


of 12 pictures each given to three groups of acaden i articular fé 
sible to tell from the published data the number of Ss on which partic 
are based, 


2-5 
1 -3, and 2 
are reported between Occasions 1-2, 1-3, à 


e 
,d a 
sons report? 
e different test sessions, The correlations ref 


per 
p num sos 
high school boys. The abilil 


, wer 
pictures of the four-picture versions), Suy” 
assembled. The correlation for the of the 
three-picture version, using portions rted 95 
scores of the same 32 persons, is rep? ent 
.64 (Table 4), When these two ed" 40 
forms were administered to a group 
male college students under neutra e 
tions with an interim of 1 sem a ai 
measures, the correlation is reporte 
Thus one alternate-forms estimate 15 ; e 
another is .22. Why the discrepancy the i, 
-64 and .22? It would appear that í chant 
trial capitalized on a particular set ? 


fluctuations, since several 


z arrangements of 
Pictures were 


A tried post hoc in eliminating 
Pictures, and, as so often happens, a second 
trial was less favorable. Other calculations of 
Percentage agreement” based on these same 


a are somewhat misleading (see Murstein, 
63, p. 157). The percentage agreement 


ears no systematic relation to the size of a 
Product-moment correlation and could be high 
When the correlation was low or the reverse. 
Te other major source book, Motives in 
asy Action and Society (Atkinson, 1958), 
E^ two chapters presenting some information 
n reliability; Chapter 45 by Haber and 
Ee giving some data for parallel forms, 
givin hapter 46 by Reitman and Atkinson 
8 some split-half data. 
n aber and Alpert tested subjects twice with 
Ed forms of 6 pictures each (12 different 
culled +): The 12 test pictures were carefully 
ests Gane a pool of 120 pictures using pre- 
vith 200 subjects. When a control group 
6 subjects was tested with the equivalent 
hi under the same verbal instructions 3 
Xperj apart, the correlation was .54. For an 
Was aped group, where the second testing 
ions one under achievement-arousal condi- 
jects Kgitferent instructions) with 54 sub- 
thing: S Correlation was .45. Reitman and 
n’s estimates are lower: .28 to .38. 
se ough information is given in either of 
of j enh blished chapters to allow estimates 
Statistics, 
er more recent books in this area make 
Fear ention of reliability (Atkinson & 
lange? 1966; McClelland, 1961; McClel- 
On Winter, 1969) or comment briefly 
aus” reliability noted by others (Heck- 
ey 1967; Kagan & Lesser, 1961, p. 278). 
datą — “alysis of several sets of unpublished 
of puli Sented earlier and this brief review 
ability isheq findings both suggest that reli- 
Meng” mi fantasy-based measures of achieve- 
Me vation is generally low. Even a 
“ading Survey of the literature can be mis- 
Parked Owever, because of the often re- 
lish 0 tendency of American authors to pub- 
RE aniline findings, with lesser findings 
SJectey Ung an editor's desk or even being 
"bleq ES Nevertheless, most of the data 
“iabiniy ere suggest that the mean value 
tty (test homogeneity) for fantasy- 
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based need achievement measures is around 
30. With a sample size of 40, a typical size 
for achievement motivation studies of a 
decade ago, such reliability estimates would 
have a standard deviation of about .16 (see 
Jensen, 1959). Sampling fluctuations could 
easily cause a coefficient with this mean value 
to range up to .60 on the positive side. Ac- 
tually, .60 is a larger reliability estimate than 
most of those values seen in Table 1. If the 
true mean is higher, then one would expect 
some values in the literature considerably 
higher than any that have been reported, per- 
haps between .80 and .90. 

Jensen (1959) and Mitchell (1961) pre- 
sented complementary reasons explaining how 
the low reliability comes about. Jensen, in 
noting the low internal consistency reliability 
of the TAT in spite of its high scoring reli- 
ability, concluded that “S’s response to one 
card is no basis for prediction of response on 
other cards [p. 123]." Mitchell, in factor 
analyzing a number of need achievement 
measures including one measure based on 
fantasy production, found that the only “fac- 
tor” in the fantasy-based measures was an 
“error” factor. Both these authors pointed to 
the failure of fantasy-based measures to yield 
sufficient covariance to overcome shortness in 
the test, the outstanding conclusion from my 
analysis of item statistic data presented 
earlier. Other tests, for instance the Scholastic 
Aptitude Test, have low interitem correla- 
tions, but the number of items is sufficient to 
compensate. With few items, the items must 
be highly related for a reliable instrument to 
emerge. 


OTHER EVIDENCE SUGGESTING Low 
RELIABILITY 


Several other kinds of evidence are con- 
sistent with the assertion that fantasy-based 
achievement measures have low reliability. 
These are now reviewed. 

First, there is a disturbing failure to find 
correlations between measures supposed to 
assess the same motive, between the French 
Test of Insight and fantasy-based measures, 
for example. Three methods of measurement 
—direct questioning of subjects, observers’ 
ratings, and fantasy-based—are said to yield 
essentially uncorrelated results (Atkinson, 
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1958, p. 41; Crandall, 1963). Other reports 
not cited by Atkinson or Crandall (Atkinson 
& Litwin, 1960; Bendig, 1957; McClelland 
et al., 1953, p. 256; Shaw, 1961; Weiss, 
Wertheimer, & Groesbeck, 1959) and a re- 
cent summary by Weinstein (1969) heavily 
reinforce this conclusion of lack of con- 
vergence among measures of need achieve- 
ment. 

Second, recent studies, carefully done and 
more extensive in scope than earlier studies, 
yield few positive relationships between need 
achievement and other variables. As studies 
have increased in size and tightened methodo- 
logically, positive findings are fewer. A few 
examples will show this. 

1. When Baughman and Dahlstrom (1968) 
computed 96 chi-square values between 
teachers’ ratings of achievement behavior in 
the classroom and achievement test scores 
( = 480), one was significant. In spite of a 
careful and sophisticated examination of a 
large number of children and consideration of 
many other variables with fantasy-based need 
achievement, there were few positive findings 
for Baughman and Dahlstrom to report. 
(From unpublished data for 13-year-olds, 
generously furnished me by Baughman [per- 
sonal communication, July 1969] I calculated 
reliabilities, corrected by the Spearman-Brown 
formula, using scores from two sets of six 
line drawings given 1 week apart. For Negro 
and white girls and Negro boys estimates lie 
below .50; for white boys, .60. Their study 
also included 7-, 9-, and 11-year-olds.) 

2. From a study of fourth- and fifth-grade 
boys (C. P. Smith, 1969), few positive results 
emerge: no relation was found between 
achievement motive scores and indexes of 
independence training for either parent; no 
relation was found between three measures 
of reading performance and motive score; 
and so on. 

3. Using a four-picture fantasy-based mea- 
Sure of need achievement with 238 ninth- to 
twelfth-grade boys in West Virginia, Tseng 
and Thompson 7 found no difference among 

* M. S. Tseng and D. L. Thompson. Need achieve- 
ment, fear of failure, perception of occupational 
PESE and occupational aspiration of adolescents 

different socioeconomic groups. Paper presented 


t 4 e 
rà the meeting of the American Educational Research 
association, Los Angeles, February 1969, 
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three social class groups in achievement 
motivation, although they did find differences 
among the groups on test anxiety and occu- 
pational aspirations. N 

The list of recent examples could be ex 
tended, but the point is already made: x 
positive results have been found in or. 
large-scale studies when attempts were ma e 
to link fantasy-based scores to other variables. 


Wuat May Propucr FINDINGS 


p . E at 
The reader may find it perplexing tha 


need achievement, a measure that seems from 
the overview here to have low reliability; E 
purported to predict school grades or e 
formance at tasks like anagrams. The p. 
sure, in other words, is reputed to have va 
dictive validity. Is this reputation suppor 

able? I do not believe so. 

Especially in view of the tendency for we 
positive results to be reported, the me 
of studies reporting relationships bare 
achievement motivation and scholastic P al 
formance is not large. Klinger (1966) we 
careful review found 32 such studies: , as 
concluded that 17 showed significant ipe 
with molar performance measures (DY Bi rat- 
he means course grades, grade averages, i 
ings of long-term behavior patterns, Me 
and that 15 showed nonsignificant pem 
with such measures. About half the sue T 
then, reported a statistically significant T 
tionship. Relationships with task-perform ipe 
measures (brief sequences of behavior witl 
anagram solutions) were less favorable, "ri 
11 significant and 16 nonsignificant. i. 
the positive studies noted by Klinger ela 
rather easily represent instances en 
tionships between motive scores and ara 
spring from the correlation between "point 
and productivity (and/or IQ). This J 
requires elaboration. ot not? 

In the ninth-grade survey (see eed oC 
3) consistent productivity differen t* and 
curred between sexes, between IQ ie write 
between some social class groups. Gir ghi 
more words per story than boys, P chil 
children write more words than low- relate? 
dren. Furthermore, productivity lose 
with school grades better than motive a » 
and for some groups the correlation à xam 
productivity and grades is sizable. For 


aly 
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Group | 


Motive score 
vs 


Productivity vs. 
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© GRADES AND PRODUCTIVITY, CORRELATIONS 
N GRADES AND MOTIVE SCORE 


Correlation 


ades 


Hrodhetivity English | Social science English | Social science, 
= ——— — e = | 

Medium IQ girls 37 37 A 7 bn | 18 

igh IQ girls 60 29 ET 43 30 25 
Medi 

qum TQ boys | JH | E 26 56 37 | AT 

ith IQ boys | 60 AS 27 22 el 25 

Average — 38 34 .20 29 
= | 


ns the average correlation between number 
Words written per story and grades in four 
ji Subjects is .41 for blue-collar white 
midde high 1Q, and .48 for average IQ 
€-class white boys. The uncontrolled in- 
E. of productivity could thus easily ac- 
9r correlations between motive scores 

n grades in those subgroups where motive 
"es happen to have modest reliability. For 
Ere kinds of respondents Whose motive 
Dp e of above-average reliability in the 
E survey, productivity is a good 
i is or of grades. This is because productiv- 
Probably index of verbal achievement and 
Footnon, also of academic socialization (see 
e 3). When experimenters enter a 

icrtag ask students to write “imaginative 
to pictures, the student's willingness 

in nly and effort expended are reflected 
Willin number of words written. A student's 
op ,8héSs to comply with similar requests 
Dro anchers—his academic 
Y Influences his grades. 

moti cific example of a correlation between 
through Scores and school grades mediated 
the pof, E  ueliiir will further illustrate 
bilities - In the ninth-grade survev, relia- 
Stu 9r the four subgroups of the pilot 
telia eed from .57 to .71, higher than 
Viewed les noted for 22 subgroups inter- 
Wete y Subsequently, because test pictures 
Dilo Selected after inspecting scores for the 
Relationships between produc- 


e 


socialization— 


tivity TOU. 
Srades (in English and in Social 


Sciences) are noticeably stronger than those 
between achievement motive scores an 
grades (Table 5) for these subgroups, and 
the correlations between productivity and 
need achievement scores are sizable. Appar- 
entlv, then, when motive scores do achieve 
modest reliability, they correlate with produc- 
tivity, but the correlation between productiv- 
ity and school grades exceeds the correlations 
between the motive scores and grades. 
Within all strata of the ninth-grade survey 
(n ranging from 16 to 41) correlations be- 
tween productivity and achievement motiva- 
tion are altogether unimpressive as would be 
expected with the low reliability of the mo- 
tive measure, but productivity consistently 
correlates more highly with grades than the 
fantasy-based motive scores. IQ, seldom con- 
trolled in achievement motive studies, is 
roughly controlled in the ninth-grade survey. 
There is, however, a strong correlation be- 
tween IQ and productivity that, if uncon- 
trolled, would further inflate correlations be- 
tween productivity and grades. When both 
productivity and IQ are uncontrolled, as in 
most studies of achievement motivation, an 
occasionally reliable motive score (the reli- 
ability statistic is of course subject to sam- 
pling fluctuation) could transmit the influence 
of both prior variables and lead to “predic- 
tions" of school grades from motive scores, 
Corroboration of this role of productivity 
in generating results attributed to achievement 


motivation is contained in Ricciuti and 
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Clark's? report that in a 12-picture test, 
word counts on 11 of the 12 pictures signifi- 
cantly (p « .01) differentiated between re- 
laxed (n = 75) and achievement-aroused (n 
— 71) subjects even though achievement mo- 
tive scores on only 5 of the same pictures 
differentiated at a similar level. Earlier Ric- 
ciuti (see Footnote 6) and Ricciuti and 
Sadacca (see Footnote 5) anticipated my 
finding that productivity predicts school 
grades better than motive scores, for they 
found that a word count was just about as 
effective a predictor of school grades as their 
motive score. Incidentally, word counts based 
on four stories have very high internal con- 
sistency, ranging from .76 to .98 in the small 
homogeneous samples of the ninth-grade sur- 
vey (see Footnote 3). 

As already noted, IQ when not controlled 
undoubtedly contributes to correlations be- 
tween achievement motive scores and per- 
formance measures like school grades when 
such correlations are found. Shaw's study is 
one of the few in Klinger's (1966) list to 
control on IQ, and it shows essentially no 
relation between several measures of achieve- 
ment motivation and achievement in high 
school juniors and seniors. 

Another persistent finding in the achieve- 
ment motivation literature is a sex difference. 
It is reported (cf. Klinger, 1966, Table 1) 
that fantasy-based measures are more often 
related to school grades for males than for 
females. Of the eight college studies of fe- 
males, six reported nonsignificant relations 
between achievement motivation and grades 
for females, while for males about half of the 
studies found significant relations with grades. 
This sex difference in the “predictive valid- 
ity” of achievement motive measures has pro- 
voked considerable speculation, with inap- 
ed 5: Mani material (pictures 
e E duy Veram for females of voca- 
favorite Aa agai etc.) being a 

2 ation for the failure to note 


Significant relations for females. Often, too, 
as 


SH. N. Ricciuti and R, A. Clark. A 
need achievement stories w i 
and "achieveme| 
E obtained with new Pictures and rı 
= egories, (ONR Contract N 

n, N. J.: Educational Testi 


comparison of 
written by experimentally 
nt-oriented” subjects: Ef- 
evised scoring 
9nr-694(00)) Prince- 
ng Service, 1957. 
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it is speculated that female achievement 
needs are expected in different areas than 
those appropriate for men. 

The ninth-grade survey data (see Table 1) 
show low reliability generally, but higher reli- 
abilities for males than females, particularly 
for those who are white and of high IQ. This 
suggests that one reason for a sex difference 
in “predictive validity” is the generally lower 
reliability found for females on the motive 
measure. There are also two other kinds 0 
statistical factors operating to reduce corre 
lations between girls’ grades and motive 
scores. First, unlike boys, girls’ productivity 
scores do not show much relationship Mk. 
grades. (Perhaps all girls are above some 
critical level in terms of academic soci 
tion; at any rate, girls are consistently hig 
in productivity than boys over all A 
Second, girls’ grades, the criterion varia? 7 
have less variance than boys’. For white n a 
dle-class students, boys’ grades have ssl 
average standard deviation about twice ae 
of girls in the ninth-grade survey. (Colemé in 
1961, noted a similar reduction in variano 
girls’ grades.) Girls’ grades in English, € 
Science, Mathematics, and Science also nove 
lower intercorrelations (Table 6) then en 
indicating less consistency (less reliabilii g 
Both the narrowed range and lower m 
correlations in girls! grades would atten 


: : othe 
relationships between girls’ grades an motiv 
variables. If relationships between ©“. 


dia 
me n 


scores and grades are actually the sm? 
n Uu 
girls 


through productivity, as I suppose, 
relationships between productivity an grade? 
grades plus the lower reliability in the Puffer 


in “ RT) ol a h 
ences in “prediction” of scho ( sco 


t n - 
from fantasy-based need armeve aei i rel 


I ret 
sight and the Iowa Picture IntetP™ © 10 


ua’ 
Test (see Klinger, Table 1, 1966) wie hem 
be predicted if properties of the pes 
selves are responsible for sex differe! 


DISCUSSION yen 


) jasm for eg 
Part of the current enthusias ayati 


jV us 
P P mot 
ing methods to assess achievement „ged 


" : . tdisadvan 
lor young children and for disa 
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TABLE 6 
INTERCORRELATIONS OF GRADES 
p = Es RE UE 7 = 
Grou: English cial Science, | Social Science/| Mathematics, 
Mathematics Mathematics Science Science 
E m [o = 
High IQ blue collar, rural, | 
and middle class | 
GU | 59 60 49 40 53 42 
M irls 47 4l 24 44 4i 30 
Middle class, high IQ 
Boys 59 55 62 48 56 67 
Girls 39 24 00 39 17 26 
p OOOO in SS uu : _ x 
Note.— Decimals have been omitted, 


ires Stems from the belief that achieve- 
Ein motivation has been successfully mea- 
In middle-class high school and college 
M and that it has been linked in sig- 
ium. Ways to school achievement. of the 
m y hundreds of articles in the achieve- 
try js gre TER literature, only a handful 
ance ay the motive to academic perform- 
Bests > e evidence I have assembled sug- 
may p nah verbal productivity and/or IQ 
Positive sufficient to account for the few 
based e relationships that do exist. Fantasy- 
Measures of achievement motivation 
dici E have little or no independent pre- 
a ay for school performance. I 
Sults Bud « out how the few "positive" re- 
"elabi 4 emerge despite the generally low 
rise Ws Y, and how the sex differences could 
of p i function of differential reliability 
tens. ades and motive tests. 
ur caantment with need achievement 
tu €s based on fantasy production is ac- 
more widespread than I at first sup- 
linger (1966) stated: “The overall 
i ding [eniin can only be described as 
he evelopm 98]." In his recent review on 
Mon (Seer of achievement dispositions, 
) stated that it may be neces- 
mi g paidon entirely the notion of a 
tive, $9Dal motive in favor of many rela- 
Doi tog ae motives specific to par- 
sn our i competition. For example, he 
low in “ne the lower-class black’s dis- 
c verant i learning may not reflect 
ing of A motivation but rather a 
Same ^ &tra-c] Chievement-related behaviors 
© poi assroom objectives. Much the 


Doint 
of view is expressed by M. Smith 


(1968) who questioned the generality of fan- 
tasy measures of achievement motives and 
their openness to contaminating iníluences. 
He felt that there has been slippage between 
the theoretical definition of the motive and 
what has actually been captured in measure- 
ments. 

The present article looks at need achieve- 
ment research differently from Klinger, Solo- 
mon, or Smith (whose concern is with predic- 
tive validity) but is consistent with their 
points of view. The purpose here has been to 
demonstrate that the inconsistency in results 
and the lack of predictive validity most likely 
stem from low reliability of the measure 
(estimated to be typically in the range .30 
to .40). The conclusions I reach about low 
reliability lead to resolution of some of the 
“paradoxes” in the need achievement litera- 
ture—íor instance, the puzzling failure to 
find “meaningful” relationships for women 
when they are found for men. Correlations 
with other variables like IQ and story length, 
which are almost never controlled, have prob- 
ablv led to occasional significant correlations 
with dependent variables in small samples, as 
I have tried to show. As samples have in- 
creased in size, the number of positive find- 
ings has diminished. The finding of more 
numerous significant relationships with 
grades for high school students than for col- 
lege students is probably owing to the greater 
variability in high school grades rather than 
any properties of the motive measure. 

For most of the work that deals with 
achievement motivation in an educational 
context, low reliability is a serious handicap 
because correlational studies with other varia- 
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bles are the rule in such research. If studies 
were limited to those concerning group differ- 
ences, the low reliability would be less of a 
handicap. Actually, however, interest is 
usually in providing programs or instituting 
innovations that will improve school perform- 
ance, so intragroup correlational studies are 
needed. 

It is very likely that other fantasy-based 
measures (affiliation, aggression, curiosity, 
power, and the like) suffer from many of the 
same drawbacks that afflict measures of 
achievement motivation, although I have not 
studied them at all. Users of any fantasy- 
based measure would be well advised to look 
carefully into reliability in advance. 

In closing, it seems appropriate to repeat 
sentences written by Jensen (1964) about 
the Rorschach test and apply them to fantasy- 
based tests of need achievement: 

The question of why [these tests] still have so 
many devotees and continue to be so widely used 

. is beyond the scope of this review. A satis- 
factory explanation of the whole amazing phenome- 
non is a task for future historians of psychology 
and will probably have to wait upon greater knowl- 
edge of the psychology of credulity than we now 
possess [p. 75], 
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SOME MISCONCEPTIONS OF FACTORS 


LEG 


University oj S 


By simple procedures of averaging 


vestigators have attempted to determine whether new proposed abilities in 


the structure-of-intellect model are di: 
tional types as in IQ scales, and ev 
Írom one another to justify separate 


it is shown that the picture of structure-of-intel 


rived merely from averages of correla 
correlations can be accounted for by 
Considerable value is pointed out for 
of structure-of-intellect abilities, 


The purpose of this note is to correct some 
impressions regarding — intellectual-aptitude 
factors that have appeared in the literature. 
Some very questionable operations have been 
applied to: correlations, leading to conclusions 
that must not go unchallenged. 


UNIQUENESS OF DivERGENT-PRODUCTION 
ABILITIES 


In a study directed at abilities purported 
to be highly relevant for creative talent, R. L. 
Thorndike raised the question as to whether 
the divergent-production abilities that have 
been demonstrated several times by the writer 
and his associates, and others, constitute an 
area of aptitude distinct from that assessed 
by traditional intelligence tests in IQ scales. 
His approach was to average coefficients of 
correlation for selected tests of (a) certain 
factors within the alleged category of cre- 
ativity, (b) certain factors within a category 
that he considered to be representative of IQ 
scales, and (c) correlations between tests of 
the two different categories. The answer he 
sought depended upon a comparison of within- 
category correlations with between-category 


correlations, But things are not i 
that, g so simple as 


_In the application 
dike (1966) cho 
broad creative- 
Production api] 
one convergen 
—— 


of his procedure, Thorn- 
Se to represent his assumed, 
aptitude factor, six divergent- 
ities, one cognition ability, and 


t-production ability, as they 
* Reques ‘oo 
Guilford, S for reprints should be sent to J. B 


90213, Box 1288, Beverly Hills, Californi, 
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outhern California 


coefficients of intercorrelations, two in- 


stinct from intellectual abilities of tradi- 
en whether they are sufficiently distinct 
measurement, By means of an example 
llect abilities cannot be de- 
nts. Unexpectedly large inter- 
at most tests are not univocal. 
ntial measurement along the lines 


tion coefficie 
the fact th. 
differe: 


would be placed in the writer’s basic operation 
categories. What Thorndike called a ion 
vergent” category included four cog ) 
abilities, one memory ability, and one e 
tion ability. The samplings as to content V 3 
also somewhat haphazard. The diverg se 
production (DP) group included four m- 
mantic abilities, one figural, and one d 
bolic. The non-DP group (which is more 5; 


uie ent ) 
curately descriptive than to say “converg one 


: r S ral, an 
included two semantic, three figural a upon 
symbolic ability. The sampling depende "^i 


what data could be found in the literature 
the time of his study. »ragie 
As is demonstrated later, merely aver fain 
correlation coefficients in order to , risky’ 
information regarding factors is Td uch 
business. If factors were so clear fro” ns; n 
cursory examinations of intercorrelati0" "sr 
would not have taken all these gem J 
rive at the factorial picture that 
possess, is 0 
Thorndike adopted the hypothes? 
very broad intellectual dimensions" 
for the non-DP tests and a minor od 
DP tests. The latter factor he -—" 
pected to account for correlations p ver 
DP tests, particularly. Now, in e tions: 
extensive findings of zero intercorre. , fac 


: . o 
1S unthinkable to accept hypotheses ^ yri 


£ 

tors of any substantial scope- mo, 

(Gui ] that ^ ont 

uilford, 1964) has reporte? n. i 
Some 7,000 correlation coeficien 


ma 
tests for intellectual functions; uc nob 
18% fell below 1. Some 24% Wet jectio” 
ficiently above zero to justify T€ 


p" 
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the hypothesis that they were zero or pos- 
sibly negative. For the two analytical studies 
from which Thorndike drew his data, the per- 
centage of correlations that could be accepted 
as being zero was 16 for the one and 18 for 
the other, among DP tests only. 

If we adopt the alternative view, that the 
Structure-of-intellect (SI) abilities are mu- 
tually independent, or nearly so, we should 
expect correlations between tests of different 
abilities to be small, if not zero. This should 

e true within the same category, such as 

| Wergent production, as well as across cate- 
8ories of abilities. Thorndike chose two tests 
Or each selected SI factor. For any such pair 
of tests, the correlation should be of substan- 
tial size; above -30, let us say. If the two tests 
Were univocal for their common factor, with 
. PAdings of .6 on that factor and loadings of 
- 9 for other factors, their correlation should be 
4 bs -6, or 36, not a high correlation. More 
typical loadings for tests representing factors 
te near 5, which would mean a correlation 
of only .25 between two univocal tests. Most 
| a rrelations between tests for the same factor 
re higher than that because the tests are not 
univoca], They often have additional, sec- 
3 a Common-factor variances of at least 
sua ils size. Obtained correlations of .5 
tests d almost certainly indicate that the 
in cg Dvolved have one or more other factors 

™Mmon, 

n Thorndike's operations, with so many 
feren tO Correlations between tests of dif- 


limita factors averaged along with the more 
of Number of correlations between tests 


to he ame factors, the averages are likely 
tors dite small This is true even for fac- 
Yergent an the same category, such as di- 
throug Production, More generally, if factors 
depent the SI model are about equally 
on 8 ent, and if tests reflect this condition, 
"elation expect to find within-category cor- 
Ween 5 to be about the same as those be- 
hj. teories. 

Thon, eate of affairs is essentially what 
Dp Cory rs found, In one example, the within- 
Paty Sy te averaged .14, the correlations 
Md c n P and non-DP tests averaged .12. 
33, Bror “ithin-non-DP correlations averaged 

? all his results Thorndike concluded: 


there is some reality to a broad domain, distinct 
from the domain of the conventional intelligence 
test, to which the designation “divergent thinking" 
or “creative thinking” might legitimately be applied 
[Thorndike, 1966, p. 447]. 


Rather than accepting a creative g, he might 
have given more attention to the average of 
only .14 among DP tests. 

There is, of course, much better evidence 
for a DP category, derived from factor analy- 
sis. And that evidence is in favor of the SI 
hypothesis of 24 DP abilities, 23 of which 
have been empirically demonstrated. As for 
the entire category of “creative” abilities, it 
should be added that there are also a number 
of abilities dealing with transformations that 
can claim relevance for creative aptitude, for 
they pertain to flexibility in thinking (Guil- 
ford & Hoepfner, 1971). 


S OF STRUCTURE-OF-INTELLECT 
Factors GENERALLY 


UNIQUEN 


Cronbach (1970a) has also applied an 


averaging technique, and one or two others, in 
a few more limited areas of the SI model. He 
has come to the sweeping generalization that 
"It will almost certainly be impractical to 
measure individual standings on the separate 
Guilford factors, since the cell factors are 
overshadowed by broad factors . . . [p. 341].” 
Because he found some correlations between 
tests of different factors were not so far be- 
low those between tests of the same SI fac- 
tors, he concluded that SI abilities are corre- 
lated and that they could be mainly accounted 
for in terms of broad group factors. His 
thinking evidently involved the implicit as- 
sumption that tests are colinear with their 
factors (except, for chance deviations), for 
he seems to accept average correlations be- 
tween tests as indicating correlations be- 
tween factors. It is extremely rare that tests 
are colinear with their dominant factors. 

In order to make more explicit the inade- 
quacies involved in averaging correlations, let 
us consider a simple example involving just 
two abilities. The two factors represent an 
extreme instance that could well be used to 
support Cronbach's case against SI factors, E 
one were to accept his procedure. The factors 
have been among the most difficult to sepa- 
rate in the writer's experience. They are for 
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TABLE 1 


LOADINGS OF SEVEN FicURAL-DivERGENT-PRODUCTION 
Tests ON Five DivERGENT-PRODUCTION Factors 


Teu Figural factors | Semantic 
num- Test name 
ber | : 
DFU | pes | DFI DMU | DMT 
20 | Makea Figure Test| .61 92 | .06 | .16 
50 | Sketches 3 | 27 | s 
10 | Dot System 16 | 02) |—.03 
22 | Make a Mark 18 | 228 | .03 
29 | Monograms | 12 | 28 14 
9 s 10 | li 00 
23 | 34 | i08 17 


Note.—DFU, DFS, DFI = divergent production of figural 
units, systems, and ‘implications, respectively; 1 i 


abilities divergent production of figural units 
(DFU) and divergent production of figural 
systems (DFS). Table 1 shows the factor 
loadings of four DFU tests and three DFS 
tests on those two factors, and also loadings 
on three additional selected DP factors, of 
figural implications (DFI), of semantic units 
(DMU), and of semantic transformations 
(MT), where there are nonzero loadings that 
Show some of the extra contributors to cor- 
relations among the seven tests. This small 
matrix was extracted from a much larger one 
With 59 tests and 23 factors represented 
(Guilford & Hoepfner, 1966). 

The difficulty in obtaining a clear-cut 
orthogonal solution with regard to the two 
factors DFU and DFS stemmed from the 
Strength of the correlations of the tests across 
the two factors. In terms of the orthogonal 
solution achieved, reasons for these unusually 
Strong correlations may be seen in Table 1. 

ere it will be noted that one of the four 

FU tests had a significant (30 or higher) 
nd two of the three DFS 
t loadings on DFU. There 
nificant but material load- 
two other DFU tests on 
could add as much as 4138 


loadings on the thr 


iri ee additional common fac- 


S 3 
ach’s operation of 
"» (for units 
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tests) is .38; the average 7,, (for systems) 1S 
41; and the average between-factors corre- 
lation Fus is 40. The averages are thus of 
equivalent size. E 

It is not clear how Cronbach's conclusion 
that “the cell factors are overshadowed by 
broad factors" was derived, for his praese 
for finding broad factors was not described. 
Such a conclusion is often drawn from @ 
centroid analysis. Such an analysis of the 
obtained correlations in Table 2 would um 
doubtedly yield a dominant first-centroid fac- 
tor that would account for a large proportion 
of the variances in the seven tests. Years ko 
Thurstone pointed out the need for gua 
of the first centroid axis with other n 
axes in order to achieve multiple factors re 
reflect psychological meaning. This princll 
is very generally accepted. ol 

It is doubtful whether the broad pat 
which Cronbach speaks have any basic E f 
chological significance, and they are aie 
far from invariant, even as mathema 
variables, as test batteries are changed ots 
does not name or interpret the broad s all 
except for one reference to Vernon's P acis 
educational factor, But that factor, also, i seS 
invariance, and is to be found in pe 
primarily when tested populations ate à 
homogeneous with respect to age; Sen a ar 
education and when factors of the SI oTi 
inadequately represented by tests. bee" 
population as well as of tests is Hen a 
a good analytical study. A good factor 
Sis is an experimental investigation. esëntë 

The failure of a number of tests as which 
in Table 1 to stay clear of factors : j 
they have secondary loadings m jn sub” 
tributed to failure to control T ri were 
jects actually did, what their ipt ^ 
or how they conceived of the enm atio” 
with which they dealt. The tested pa eing 
was probably sufficiently homogeneo A Sex 
composed of ninth-grade students i 


differences were negligible. eat contre 
e " e d 
As an example of failure of each pr? 


let us take the Monograms test. In 


“By contrast, a high degree ite d 
3 ast, i . in spi fot! 
been demonstrated for SI factors sons Guill 
in test batteries and tested popula 

Hoepiner, 1971). 


lem of this test, the subject is given three 
Capital letters, for example, V, C, and M, and 
Is to make as many different organized mono- 
grams as he can in limited time. For some 
Subjects the task might have been primarily 
à matter of producing units rather than sys- 
tems, especially if the monograms could be 
Simple, Some problems may have taxed or- 
BSanizational skills very little. The organizing 
‘pect is crucial for ability DFS. Other 
miscarriages as between DFU and DFS can 
similarly rationalized. The seven tests have 
| E been revised since their first utilization in 
Gh ues but it is clear that revisions of cer- 
qm K " s should, and could, be made, or new 
l Eia. DFU and DFS could be devised, 
E. ng earlier mistakes. As in so many in- 
Nees, hindsight is better than foresight. 
Which de to see something of the extent to 
Bou oe other than DFU and DFS 
l able og contributed to the correlations in 
P in the ; the reduced. correlations appearing 
I Ower-left section of that table were 
; uted from the appropriate factor load- 
corri, Table 1. For example, the reduced 
lon between Tests 20 and 50 was com- 


Du P 
| L| from the sum of the cross-products of 
l Or loadings, 61 X 52 + 41 X 21, which 


p 40. Except for this particular coeffi- 
re hey one other, the reduced correlations 
Ones er than the corresponding obtained 
b a ouid be expected. Where they are 

e ative nie there are probably some small 
the table oadings on factors not mentioned in 
Out. Ble which should help to balance things 
tions Sewhere there are some notable devia- 
tiong ; etween obtained and reduced correla- 
for, în the expected direction. For example, 
See 229) the difference is .19, and we can 
both ow the DMU and DMT loadings for 
due leg tests would add something to re- 
Semantic. difference. These two abilities are 
able tg © rather than figural, But it is reason- 
the Beng nose that in producing figural ideas, 
once jo of semantic or abstract-thought 
nbject pd Such as the conceiving of real 
content translation from semantic to 

0 gain f" could have helped many sub- 
eis p Quantity of output. Another ex- 
© difference .17 for the correlation 


23), 
er * 
* We see that the common factor 


MISCONCEPTIONS OF FACTORS 


305 


TABLE 2 


OBSERVED AND REDUCED INTERCORRELATIONS AMONG 
PIGURAL-DIVERGENT-PRODUCTION TESTS 


Test | Test number 

num- = 

ber 20 | 50 | 10 2 | 2 | o | 33 

ee be i-— 3. — | 

20 - 39 45 ES .52 54 40 
50 A0 mi 33 ES 49 12 33 
10 |.36 31 — 36] 29 4i 35 
22 | 38 29 28 — 3i 35 32 
29 | H .30 .28 .30 — A0 35 
9 | 44 .30 .29 30 | 4H ET 


23 330 49 [8 ..19 |.33 82 


ficients at the upper right we 
d correlation: 
actors dive: 


T e 7 
from the original report (Guilford & Hoepfner, 1966). 


DFI alone could add as much as .05 to the 
reduced r. In both examples, the remaining 
deficits could be made up by accumulation of 
still other minor cross-products. 


Differential Measurement of Abilities 

As quoted earlier, Cronbach concluded that 
it is impractical to measure individual status 
in the different SI abilities, owing to what he 
considers to be high correlations between the 
different factors. In the example of the two 
rather similar factors featured in this note, 
both abilities are concerned with divergent 
production of visual-figural information. Only 
the kind of product differs. It was noted that 
the test intercorrelations across the two fac- 
tors averaged about .40. Is this much corre- 
lation so high as to render differential mea- 
surement impractical? One pair of tests cor- 
relates only .29. Improved test construction 
might even lower this quantity. Among many 
other pairs of SI factors, pairs of representa- 
tive tests that correlate much less can be 
found. 

A number of studies on the prediction of 
achievement criteria from multiple-regression 
equations with SI-factor tests have strongly 
attested to the value of measuring and 
weighting SI abilities separately, using both 
factor-test scores and derived factor scores. 
Examples are the predictions of achievement 
in ninth-grade mathematics (Guilford, Hoepf- 
ner, & Petersen, 1965) and in tenth-grade ge- 
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ometry (Caldwell, Schrader, Michael, & Mey- 
ers, 1970). Not to keep SI scores separate 
loses considerable information and precludes 
optimal selection and weighting of components 
when composite predictors are needed. That it 
pays to make multiple predictions from SI- 
factor measures in an academic-type learning 
situation has also been demonstrated re- 
cently by Hoepfner, Guilford, and Bradley 
(1970).* 


Through the courtesy of Professor Cronbach, 
the writer has had access to a later critical treat- 
ment he has given to SI factors, prior to publica- 
tion (Cronbach, 1970b). Instead of averaging corre- 
lations, and in the area of divergent production 
only, he used comparisons of proportions of coeffi- 
cients greater than 40 between tests for SI factors 
having divergent production and also content or 
Product, or both, in common, with proportions for 
tests of factors having only divergent production in 
common. He concluded, apparently with respect to 
the entire structure of intellect, that there was dií- 
ferentiation among factors of different content but 
not among those of different products. This differ- 
ential result was not surprising to this w: 
has been aware for a long time that in writing tests 
within an operation category it is much easier to 
control things so as to avoid bidirectional relation- 
ships of tests to factors of different content than it 
ìs to avoid such overlappings with respect to dif- 
ferent products, His new procedure is also 
questionable, As pointed out earlier, high corr 
alinost invariably mean that 


What Cronbach finds by 
Bortions öf factorially am 


CONstIUCtiOn should 


such tests, In spite 


riter, who 


very 
ons 
PN | 
tests lack univocality 
this method, then, is pro- 
biguous tests, Optimal test 
Í da Wiest 
reduce materially the numbers of 
of the extensive overlapping: 


s of 
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THE 


MEANING AND RELIABILITY OF ACCURATE 


EMPATHY RATINGS: 
A REJOINDER : 


CHARLES B. TRUAX * 


University of Calgary 


In response to an earlier article asserting that accurate empathy ratings reflect 
à quality other than that defined by the scale and suggesting that reliability 


estimates are in general inflated and may be related to the number of thera- 
pists being rated in a given study, the present article presents counterarguments. 
Research evidence and arguments are presented that demonstrate that the 
Accurate Empathy Scale tends to measure what theorists and lay people in 


general think of 
fact no relations| 


is understanding versus not understanding; that there is in 
p between the reliability estimates per study and the number 


of therapists being rated; and finally, that the reliability estimates used in 
most of the studies are appropriate and generally accepted as so by competent 


statisticians. 


and Rappaport (1970) have as- 
^ that (a) ratings using the Accurate 
Mpathy Scale "clearly" are responding to 
Rune quality other than that which the scale 
(b) f defines as accurate empathy (AE); 
abilit ere is a relationship between the reli- 
Y of the ratings of AE and the number 
ey @Pists involved per study; and (c) 
“p Ported reliability of the AE scale is 
Uriously» inflated, 
Were tt USt is a serious assertion and, if it 
the yl: Would alter our understanding of 
Do. Nounti 


Iu ng research evidence indicating a 
IS 


FEN Fo 


relationship between the level of 
hi. 
Selor 


tpathy offered by a therapist, Coun” 
: Me sag or lay person and specific out- 
ating UIS such as personality change in 
Pita E Breater time out of institution per 
li ( “zed mental patients and juvenile de- 
Is, improved grade point averages {or 
fy d Underachievers, improved work per- 

: 1. n . " ~ 
Et bs dns vocational rehabilitation clients, 
re s ved reading achievement in elemen- 
1965. 2300l children (Truax & Carkhuft, 


[e Tras & Mitchell, 1971). 
la — 


Thi 
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If the second or third assertions were true 
it would simply mean that the measurement 
of AE was more crude than supposed and, 
therefore, with better measurement even 
stronger findings should obtain. Although the 
present author does not agree with these latter 
two assertions, space does not allow discus- 
sion here. Suffice it to say that more recent 
data with larger samples of therapists do not 
support their second argument and that virtu- 
ally all studies since 1966 have used Ebel 
(1951) entraclass correlations, which take 
into account the number of therapists or caseg 


involved. Ebel’s correlations tend, in fact, to 
be higher than Pearson's product-moment 


correlations. 

Chinsky and Rappaport’s main argument 
that the AF ratings are nol Measuring accu- 
rate empathy as defined by the scale is hased 
upon a client-centered study (Truax, 1966) 
that compared ratings of AE on 50 four-min- 
ute samples where both therapist and patient 
responses were available with the same 50 
tape-recorded samples in which the patient 
responses had been removed (therapist re- 
sponses only). They argued that since there 
was no significant difference in the mean level 
of AE between the edited and nonedited sets 
of tape-recorded samples, and since the cor- 
relation between the edited and nonedited 
approximated the reliability of the scale itself 
for that study, the raters could not have been 
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rating empathy since empathy was contingent 
upon the content and/or feelings of the 
patients' statements. 

Not so. 

Not many therapists, let alone trained AE 
raters, would have difficulty in rank ordering 
the following three therapist responses on AE: 
(a) “How long ago did that happen?"; (5) 
"That must have made you upset"; and 
(c) "I sense that made you more than just 
angry. As if you felt you really wanted to 
smash or destroy something." Moreover, the 
raters had more than single responses to go 
on. Since time samples were used, the raters 
had sequences of therapist responses, and 
since therapists do not respond to patients 
independently of the patient's response, they 
could make judgments about whether or not 
the therapist's attempts at empathy were cor- 
Tect or incorrect by the nature of succeeding 

therapist responses. Quite simply, by listening 
only to the therapist responses (especially in 
client-centered therapy), one will hear a series 
of contingent responses from which one can 
reasonably judge whether or not the “thera- 
pist seems completely unaware of even the 
most conspicuous of the client's feelings" 
(Stage 1 of AE); whether the "therapist 
shows an almost negligible degree of accu- 
racy... and that only toward the client's 
most obvious feelings" (Stage 2); or whether 
the "therapist often responds accurately to 
client’s more exposed feelings" (Stage 3). It 
must be added that all 
cally trained to listen 
only 


AE raters are specifi- 
as much as possible 
Y to the therapist responses and 
be influenced by the patient content. 


1 That AE is not measuring some global 

Bp M ecd as Chinsky and Rappaport 
erted is shown by two of th i 

y e studies they 

quoted, Although in itivel 

correlat 


to not 


K general AE is positively 
ed with the genuinen ; 


Hodh , Frank, 
1966) there a oean-Saric, Nash, & Stone 
sa " , 
etween A Strong negative correlation 


and i 
1 DOnpossessive warmth and 


CHARLES B. TRUAX 


in the other (Truax, Carkhuif, & Kodman, 
1965) there was a strong negative relationship 
between AE and genuineness. 

As to direct evidence, Shapiro (1968) 
studied the relationship between AE nig 
by trained raters and evaluations by peon 
untrained | in psychology, counseling, a 
therapy who were asked to listen to the r 
therapy samples used by raters and ant 
evaluate the therapist on a 7-point sema! 
differential of understanding-not understar 
ing. Shapiro (1968) obtained a correlator e 
.67. He thereby concluded that the AE sca 
showed adequate construct validity Eu 3 
ratings strongly correlated with what pe M 
in general thought of as understanding? 


understanding. ster and 
Further, a recent study by Ls ccurate 
Truax? correlated the ratings of 2 


ves 0 
; > va sures 

empathy with Porter's (1943) [orae stud: 

counseling interview procedures. Th 


ra 
5 : been ears or the 
involving 58 different counselors petwee” 


dent evaluations using the - 
Porter's understanding scale gere 
(p < 01) with accurate empathy, W he in 
was no correlation between AE uie or 
pretative scale, the supportive sca * ms 
evaluative scale. These two different see Qd 
for evaluating therapist responses ^ j 
lend credibility to each other. fid AE E 
The evidence, then, suggests sa what ? 
measuring something very much lik 
is claimed to measure. 1o. Bs 
Tha lations h gt 
3J. Lister & C. B, Truax. The d je wall 
tween accurate empathy and nonpos? 
and the Porter scales. Unpublished 
versity of Florida, 1970, (Available puaa 0 
J. Lister, Chairman, Department 9! yille, 
cation, University of Florida, Gain 
32601) ! 
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ACCURATE EMPATHY: 
CONFUSION OF A CONSTRUCT? 


JULIAN RAPPAPORT * 


University of Illinois at Urbana-Champaign 


JACK M. CHINSKY 


University of Connecticut 


Empirical and logical evidence is presented to refute the Truax rejoinder to 
an earlier critique by the authors. It is argued that the construct accurate 
empathy is markedly confused. The accurate empathy scale is shown to lack 
discriminant validity, and its relationship to therapeutic outcome is considered. 
The meaning of ratings of empathy in the absence of client responses is ques- 
tioned. Finally, the use of repeated measurement of the same therapist is shown 
to be faulty on both statistical and design grounds and to yield spuriously 


inflated reliability coefficients. 


The Truax (1972) response to a critique 
of the meaning and reliability of accurate 
empathy (AE) ratings (Chinsky & Rappa- 
port, 1970) fails to demonstrate either the 
construct validity of the dimension or the 
methodological appropriateness of its mea- 
sured reliability, The AE scale has not been 
shown to measure what it purports to mea- 
sure; indeed, there is evidence that it is more 
highly related to other variables than to the 
accurateness of a therapist’s empathy. In 
short, the scale lacks discriminant validity. 

In addition, the construct validity of the 
scale is not, as Truax (1972) suggests, sup- 
ported by the demonstration of a positive rela- 
tionship between AE and therapeutic out- 
come. The contention that outcome data clar- 
ify the meaning of AE would not hold even 
for well-documented outcome studies, and this 
is certainly not the case when the outcome 
studies themselves may be of questionable va- 
lidity, Furthermore, to assume that one can 
Measure “both the therapist’s sensitivity to 
current feelings and his verbal facility to com- 
municate this understanding ina language 
attuned to the client’s current feelings [Truax 
& Carkhuff, 1967, p. 46|” in the absence 
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of the client's statements raises serious qe 
tion as to the meaning of the measuremey y 
Finally, the use of repeated measurement g 
the same therapists, particularly when ed 
number of therapists is small, fails to ™ 
the logical or the statistical assumptions. 
a reliability coefficient. Empirical and E. a 
arguments are presented to support eac? 
the above contentions. 


DISCRIMINANT VALIDITY 


: at 
It is a fairly well-accepted principle UP 
in order to establish the construct validi iti 
a given psychological assessment device " 
necessary to demonstrate both A ke; 
and discriminant validity (Campbell S mea” 
1959). In other words, if a scale 15 asure 
suring that which it is supposed to a 
it should be more highly related to mea 
measures of the same dimension than logic 
sures of supposedly independent perd Aly 
dimensions. The AE scale is ?? 


deficient in this regard. 


recent Truax (1972) response to the d "E 
and Rappaport (1970) critique. rat that 
study by Shapiro (1968) which fou n 
AE ratings correlated positively 
with 7-point semantic differential iB 
understanding-not understanding. t 
datum he concludes that the E P 
ity of the AE scale is supported- j n 
nately, Truax fails to mention the fact, n 
of the correlation matrix. m m 
understanding-not understanding 
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Correlated even higher with ratings of 
therapist warmth (r = .87) and genuineness 
(n 13). Nor does he mention that AE 


Correlated higher (r = .71) with the semantic 
differential evaluative dimension (good-bad) 
than it did with understanding-not under- 
Standing, These data support the earlier con- 
tention (Chinsky & Rappaport, 1970) that a 
More general therapist quality is represented 
Y AE ratings. Certainly the Truax (1972, 
a 398) assertion “That AE is not measuring 
"me global ‘good’ quality . . ." is contra- 
Icted rather than supported by Shapiro's 
(1968) findings.? 
conatther evidence of the confusion of the 
Rites, d is again provided by Truax. He 
lated in general AE is positively cor- 
non d with the genuineness scale and the 
th Possessive warmth scale . . . [but that in 
* studies cited by Chinsky and Rappaport, 
ship” there was a] strong negative relation- 
Stem t - [Truax, 1972, p. 398]." It would 
Dis 0 the present authors that such data 
me point to the ambiguity of the AE 
Tuct, 
WO recent studies (Caracena & Vicory, 
; Truax, 1970) lend further evidence to 
difficulty in understanding the meaning 
| rating, AE scale. These studies found AE 
telaten to be significantly and positively 
to the number of words spoken by the 
and the proportion of therapist- 
Conversation spoken by the therapist. 
PE this has to do with AE as defined by 
Scale is unclear, Caracena and Vicory 
the» ) interpreted their data congruent with 
rie, oe and Rappaport (1970) and the 
sig,” Mathieu, and Klien (1967) hypothe- 
. be; m Something other than empathy is 
Biles t They argued specifically that 
Süerg c t may be forced to depend on 
that Bi al objective interviewer behaviors 
hs; i More readily available to them than 
ES em ation about an abstract variable such 
P. Pathy [p. 514]." 
ED oma 
ctl other study cited by Truax (1972) as 
ber EC. Be Tusa. The relationship 
ar i poate empathy and napa ive warmth 
s Scales, Unpublished manuscript, Uni- 
Ution ux. (1970)] was not available for 
€ time of this writing. 


erapist 
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RELATIONSHIP OF AccuRATE EMPATHY 
to THERAPEUTIC OUTCOME 


Truax (1972) cites the supposed positive 
relationship between AE and various mea- 
sures of therapeutic outcome as support for 
the validity of the construct. Chinsky and 
Rappaport (1970) made no attempt to sys- 
tematically review the adequacy of AE out- 
come studies, nor do they do so here, simply 
because they still do not know what AE is. 
It may be premature to attempt outcome 
research on a variable that is so poorly 
understood. The above point is admittedly 
debatable in that it is based on one's phi- 
losophy of science. What is not debatable, 
however, is the fallaciousness of the reason- 
ing that correlational data between outcome 
and a given variable indicate either a cause— 
effect relationship or an understanding of th 
variable. 

The use of outcome results to clarify the 
meaning of the AE scale would be question- 
able as a scientific strategy regardless of the 
adequacy of the outcome studies, but the 
problem is particularly acute when the out- 
come studies themselves are of questionable 
validity. The danger of using outcome results 
to clarify the meaning of the AE scale may 
be illustrated by the examples cited below. 

Truax (1970) found the proportion of 
words spoken by the therapist per case to be 
significantly correlated with two overall mea- 
sures of patient improvement, and to AE. 
Truax (1970) concluded from these data 
that “the higher the proportion of therapist 
talk, within the normal limits, the more em- 
pathic the talk and the better the patient out- 
come [p. 541]." Not only is this finding con- 
fusing, as is mentioned above in regard to the 
meaning of the construct AE, but the conclu- 
sion is not warranted by the data. Specifically, 
the outcome referred to is based solely on cri- 
teria of global improvement ratings completed 
by the therapist and by the patient. These 
data are subject to numerous design problems 
(e.g, rater bias, expectancy, hello-good-bye 
effect) such that most researchers agree that 
independent measures of therapeutic outcome 
are necessary. In addition, the global ratings 
were inconsistent with the results of the more 
specilic therapist ratings of target-symptom 
improvement and patient-reported discom- 
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fort. Finally, the relatively more indepen- 
dent ratings by a “research interviewer" of 
“social ineffectiveness” did not support the 
conclusion. 

An earlier study (Truax, Wargo, Frank, 
Imber, Battle, Hoehn-Saric, Nash, & Stone, 
1966), which purports to confirm the rela- 
tionship of AE, genuineness, and warmth to 
therapeutic outcome, is of questionable valid- 
ity on several grounds. In that study a com- 
bined measure of the "therapeutic conditions" 
(AE, warmth, and genuineness) was found to 
relate to patient- and therapist-reported 
global improvement measures, similar to those 
Cited above (Truax, 1970). Once again, no 
relationship was found between the therapeu- 
tic conditions and specific measures of patient- 
reported — discomfort or target-symptom 
improvement. Nor were the therapeutic condi- 
tions related to the ratings by a research 
interviewer of social ineffectiveness. 

Further examination of the data presented 
as supportive of the relationship between 
therapeutic conditions and global, self- 
reported outcome reveals the use of two- 
tailed F tests of the difference between means, 
which is in itself a questionable statistical 
technique (see Hays, 1963, p. 369). From 
these data, Truax et al. (1966) concluded 
that “the importance of the three conditions 
combined . , , to therapeutic outcome is 
clearly supported in this study [p. 400]." 
Once again the conclusions may not be 
justified by the data. 

In the same study (Truax et al, 1966) 
ratings of warmth alone were found 
negatively related to patient-reported 
Improvement and unrelat 
ane An earlier study (Truax, 
oe eer ^ man, 1965) found a negative 
Stine TI e "s genuineness scale and 
tive risa Sii demonstrating a nega- 
also be Ip between AE and outcome has 

en reported (Truax & Carkhuff, 1967). 


E h 
des from these apparently contradic- 

ny results, Truax et al. (1966) 
Posit a best-two-out. 


they conc] uded “it 


to be 
global 
ed to the other out- 


© conditions are 
other [p. 400].” 
he above do not, 
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as Truax (1972) argues, shed light on the 
validity of the AE construct. Rather, they 
document the serious state of confusion with 
regard to the meaning of the AE scale, as 
well as the need to reconsider the conclusion 
that AE predicts to therapeutic outcome. 
critical review of the AE outcome literature 
may be in order before further claims ot UM 
relationship between AE ratings and patien 
improvement can be made. 


Is THE CLIENT NECESSARY FOR AN 
EFFECTIVE THERAPEUTIC 
INTERACTION ? 


Chinsky and Rappaport (1970) cited 3 
study (Truax, 1966) that found no giner t 
in mean AE ratings with or without Fi 
statements. In addition, the two sets of nd 
were highly correlated (r — .68). It jon? nce 
contention that these data provided ein 
that raters must be responding to somet Ko 
other than the therapist's AE. “How ed ny 
assess the accuracy of a therapists e™P' pis 
unless there is someone to whom the theral 
is responding [Chinsky & Rappaport, 
p. 380]?” Truax (1972) argues that t 
pists or trained raters would have © ra 
ficulty in doing so, and that, in a 
are told specifically not to pay atten ion Í 
the client’s statements. The implicallo” . 
that the client does not matter and whi 
fact, if one sets up a tape recorder 0n 
are recorded highly rated AE statem a 
plays these to a client, regardless of id oste! 
client says, the tape recorder pu 
positive behavior change. What has therapy 
to the “client” in “client-centered 

While Truax (1972) argues wo ndent 
responses can be easily rated indepen P 
the client, the AE scale itself Togas je t 
client. For all the rater knows, that ma 
therapist is responding, “I sense * yo 
you more than just angry. As if Yo some 
really wanted to smash or destroy cou 
[Truax, 1972, p. 398],” the ae it |e 
“belly laughing.” More importantly? |... p 
be possible to rank order the "ms 
sented by Truax (1972) in Aera "m 
sounds like a “good” empathic, ction ant 
series of statements, but the quum. ifie c 
by the AE scale are far more sale ili 
Client related, For example. a ; cout” 
Stage 7 as: “Therapist respoD^^ 
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to most of the client's present feelings and 
Shows awareness of the precise intensity of 
Most of the underlying emotions |Truax & 
Carkhuif, 1967, p. 54|”; while Stage 6 is 
defined as therapist also “recognizes most 
of the client's present feelings, including those 
Which are not readily apparent . . . [but] 
he Sometimes tends to misjudge the inten- 
Sity of these veiled feelings . . . [Truax & 
Carkhuff, 1967, p. 52].” 
Not only is the client necessarily a part of 
us ratings but such ratings are far more 
difficult than Truax (1972) implies. 
Empirical evidence of the difficulty in 
Making AE ratings is found in the Caracena 
and Vicory (1969) study mentioned above, 
and in papers by Burstein and Carkhuff 
(1968) and Carkhuff and Burstein (1970). 
ese latter authors pointed out that the 
pint perceptions of AE are frequently un- 
ated to the therapist's self-ratings, or to 
Tained judges evaluations of AE. They argue 
dist, this is because client perceptions are 
Orted, and “only those persons who arc 
“emselves functioning at high levels (of the 
“rapeutic conditions) cam make accurate 
'Scriminations . 
970, p, 395]. 
the ene and Vicory (1969) tested the 
TM assumption by using nonclient fresh- 
m. and sophomores as interviewees and ask- 
8 them to rate interviewers on a measure 
l such ivea empathy. They pointed out that 
. estimates should be related to the AE 
"gs of trained judges who are chosen from 
| fing Population as the interviewees. Their 
Bites. of no relationship between trained 
Of the evaluations of AE and the a que 
them : recipients” of the supposed AE le 
thing o Conclude that there must be some- 
E Specific to the training situation that 
Sin unts for the ability to discriminate AE. 
E. istar ‘ey were using “normal” interviewees, 
& ee on the part of clients (Carkhuff 
Stein, 1970) is an untenable assumption. 
ap zn Of the above, not only does it 
Unreasonable to assume that the accu- 
.* therapist’s empathy can be deter- 
™ the absence of client statements, 


.. [Carkhuff & Burstein, 


Mineg 


Curi 
Say gs, the Burstein and Carkhuff (1968) and 
Me sig "d Burstein (1970) articles report the 


403 


but it is certainly not true that AE discrimi- 
nation is as easy as Truax (1972) suggests. 
Indeed, this raises the further question of 
its meaningfulness. If AE cannot be perceived 
by its supposed recipients, but only by 
"trained raters," can this be what accounts 
for behavior change? Rogers (1957), con- 
sidering this very question, stated as the final 
condition for therapeutic personality change 
that the client perceives, to a minimal degree, the 
acceptance and empathy which the therapist experi- 
ences for him, Unless some communication of these 
attitudes has been achieved, then such attitudes do 
not exist in the relationship as far as the client is 
concerned, and the therapeutic process could not, 
by our hypothesis, be initiated [p. 99]. 


Usr or REPEATED MEASUREMENT OF A 
SMALL NUMBER OF THERAPISTS 


Truax (1972) argues that “recent data with 
larger samples of therapists” do not support 
the Chinsky and Rappaport (1970) hypothe- 
sis that reliability may be related to the 
number of therapists. Since he fails to cite 
any “recent data,” it is difficult to respond 
directly to that contention. He implies that 
the use of the Ebel (1951) intraclass correla- 
tion coefficient accounts for the problem of 
repeated measurement, particularly of a small 
number of therapists by the same raters. This 
statistic, which is based upon analysis of vari- 
ance, is useful in determining the percentage 
of variance attributable to true variance (i.e., 
subjects) and the percentage of variance at- 
tributable to error (i.e., raters), when multi- 
ple raters are used. These percentages may 
then be employed to determine “reliability,” 
but they do not eliminate the problem of 
repeated measurement. Truax treats his data 
as if they come from a completely indepen- 
dent design, when, in fact, segments derived 
from the therapist are not independent of 
one another. This method allows for the emer- 
gence of heterogeneity of covariance, which 
will, in general, tend to produce an over- 
estimate of the percentage of variance attribu- 
table to subjects, and so will produce a spuri- 
ously inflated estimate of reliability. The 
problem of the effect of correlated errors on 
reliability coefficients within an analysis of 
variance model has been dealt with as a 
statistical problem by Maxwell (1968). 

Even if the above were not the case, the 
problem may be approached as one of design 
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rather than statistics. Since there is no point 
in repeating our argument with the specific 
example (Truax, 1966) cited in the earlier 
critique (Chinsky & Rappaport, 1970), a 
more recent example of the problem is offered 
here (Truax, 1970). In this study, 40 patients 
were each assigned to one of four therapists 
(10 patients per therapist). Six 3-minute seg- 
ments from each patient-therapist dyad were 
recorded and listened to by a total of four 
raters. Thus, each of the four therapists was 
heard by the raters 60 times (10 x 6). 

If one recalls “that all AE raters are spe- 
cifically trained to listen as much as possible 
only to the therapist responses . . . [Truax, 
1972, p. 398],” how surprising is it that rat- 
ings of the same four therapists each heard 
60 times by the same raters are reliable? In 
our own attempts to train raters in the use 
of the AE scale, the experience that stimu- 
lated our earlier critique, we found that when 
we had raters rate the same therapists several 
times they reported to us that they could not 
help but remember how they had rated each 
therapist previously. It is reasonable to as- 
sume that any college student who is capable 
of discriminating therapist empathy ought to 
be astute enough to recognize the voices of 
four therapists repeated 60 times each, and 
to continue to rate those therapists consist- 
ently on the basis of recognition alone. We are 
forced to wonder why Truax presumes that 
his raters are not affected by this obvious 
design Problem, and we repeat our earlier 
Suggestion that researchers use either a large 
number of therapists, each rated once, or a 
large number of raters, none of whom rate a 
ites ay Tt 
estimate of reli bili wo. vila 
Bile conais 1 ility, although the meaning 

ion on uc ae in question, 
ments is te her e from the above argu- 
Confused. at the construct AE is markedly 

A ’ ated-measurement de- 
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EXTENSION OF MULTIPLE-RANGE TESTS TO INTERACTION 
TABLES IN THE ANALYSIS OF VARIANCE: 
A RAPID APPROXIMATE SOLUTION 


DOMENIC V. CICCHETTI : 


Veterans Administration Hospital, West Haven, Connecticut 


The research literature discusses the multiple-range tests applied to means 
derived from the one-way analysis of variance, in which none of the possible 
contrasts is confounded. Often, however, it is neccessary to compare means in 
an interaction table derived from a factorial analysis of variance. In this case, 
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the only unconfounded comparisons are those made within rows and columns. 
The present study discusses an approximate solution that adjusts the number 
of treatments by basing the q statistic upon the number of unconfounded 
comparisons only. The solution is then applied, using actual data, (a) when 
only the K(K — 1)/2 contrasts are desired (the method of Tukey) and (b) 
when all possible contrasts are desired (the method of Schefié). The Duncan 
and Newman-Keuls tests are deemphasized, since research demonstrates these 
tests fail to control adequately for Type I error. 


oped by Newman and Keuls do not. Petrino- 
vich and Hardyck applied the above tests to 
means derived from a one-way analysis of 
variance. The situation is more complex when 
the multiple-range tests are applied to means 
based upon the interaction of two or more 
factors, In the simplest 2 X 2 interaction 
table, there are four cell means and six pos- 
sible paired comparisons, two (or one-third) 
of which are nof readily interpretable, as 
shown in Table 1. s 

Tt is clear that if one compared either Cells 
A, Bı and As Bs or Cells A» Bı and A; B», 
one could not determine how much of the dif- 
ference to attribute to Factor A, and how 
much to attribute to Factor B, all other things 
being equal. This problem of interpretation 
does not occur in the remaining four paired 
contrasts.” 
ee 

s Although it is true that the only unconfounded 
comparisons in any given interaction table are those 
made within ro and columns, this author can con- 
ceive of certain instances, especially in nonpsycho- 
logical research, in which one might wish to make 
aired comparisons that are confounded. For ex- 
pers let us assume that in a 2X2 experimental 
aesti, the two factors, A and B, refer to tempera- 
ture and pressure, respectively, and moreover, that 
A, Bi represents the speed with which a given 
chemical reaction occurs under standard operating 
conditioni whereas Az Be represents the speed with 

J a 

wh ich the same chemical reaction occurs under con- 
pues which are not standard. One might conceiv- 
Us be interested. in comparing A:B, with As B. 
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TABLE 1 


ILLUSTRATION OF CONFOUNDED COMPARISONS 
IN A 2 X 2 (A X B) Interaction TABLE 


Factor 
Factor =F 
Bi | B: 
| 
Ay Ar Bi A; By 
Ag Ay By A. Bs 


Note.—The four unconfounded comparisons a 


The number of confounded comparisons in- 
creases the greater the number of cells in a 
given interaction table. Thus, for the simple 
3 X 4 analysis of variance, the proportion of 
confounded comparisons is about 55%, as 
shown in Table 2. The importance of this fact 
is that the more comparisons one is making 
in any given experiment, the greater the prob- 
ability that some of these comparisons will be 
significantly different from each other, by 
chance alone. In order to correct for this, the 
difference between any set of means, to be 
judged Statistically significant, must increase, 
the greater the number of means one is com- 
paring. If one bases the q Statistic on 12 
means (allowing one to make 66 comparisons) 
and only 30 of these comparisons make any 
logical sense, then one is penalized by being 
forced to accept a minimal significant dif- 
ference based upon 66 comparisons, when only 
30 of these can be meaningfully interpreted. 


The Winer (1962) Approximation 


A “fragmentized” approximate solution to 
this problem has been offered by Winer 
nag who Presents the interaction table 

ma 2X 3 factorial (A x B) analysis of 

independent groups, each 
bjects. Winer’s solution is 
e comparisons within each 
Separate, “logical groupings” 
a and ay, 
; the com- 


© abı versus a, be: arb, 

betess of how hal —— 
a to the ¢ ange attributa- 
à ue IN pressure, uen to 
Stments di à this type, 
Recessitateg, 5 discussed jn this study are pr 
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verus ajb4; and a, bs versus a; bs vs vi. 
logic would apply to row az, in which t S abi 
parisons would be asb; versus az ko. 
versus a» bs; and as bs versus a» bs. o pro- 
approach is partial in that it makes f logia 
vision for comparing the three pairs 0 meats 
groupings of unconfounded olamni NO 
that is, aj b; versus as bi; ar be Te he 
and a; b; versus as ba. One would g i con 
might consider each of the three ese in 
parisons as a separate, two-mean hee o 
the same sense that each of the two f three 
row means was considered as a set O 
separate, paired-mean contrasts. —-— 
Even granting the above, wet eit, 
must regard Winer's solution as à it with six 
one which treats a single experiment exper 
cell means as if it were five apo means 
ments, that is, one based upon pe i 
within ai; another containing the. sed ou 
row ag; and three experiments [ge c 
the two means within each of the je olu 
umns, bi, b», and bx. Winer's atter moai 
tion appears inadequate for the apl ‘hose V° 
the Tukey or Scheffé methods, im error à 
logic depends upon basing the alp seit rath? 
the g statistic on the whole pee it. 
than upon fragmentized portions ¢ 


one 
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ES 


The Cicchetti Approximation 


The logic of the author's 
based upon the relationship 
number of K treatments and ; An 
K(K — 1)/2 paired comparisons: ^: on, 
tion of Table 3 reveals that AMY ? i 


"i FRE GR in an * 
secutive, increase in K results 


betwet? of 


A 
TABLE 2 mo 
5 


3X 
9E 


| Factor 2 p 
Factor Bs | 
| Bi Be a p» 
P RA M p 
A, Ai B, As A 
Ay | Ay Bi ei 
A; Ay Bi f] 


Note, ~The total nu 
given by K(K — 1) 2 


unconfounded 


EXTENSION OF MULTIPLE-RANGE TESTS 


ÁBa 
incr : A 1 
Rie in the total possible number of 
3 It = 1)/2 paired comparisons. 

K one examines each successive number of 


ones in Table 3, it is seen that as K 
example, 3, to 4, to E to 6, to 7, for 
Drisons ; i number of K(K — 1)/2 com- 
fo. 15 “on from 3 to 6 (3), to 10 (4), 
gression = to 21 (6). This arithmetic pro- 
Over the ontinues to 190, an increase of 19 
Sons, ba previous 171 K(K — 1)/2 compari- 
ing a upon 19 treatments. For compar- 
ance E. 5 from a one-way analysis of vari- 
number phen oa base the q value upon the 
the Tukey K treatments and then perform 
K ty test if one is interested in the 
paired contrasts only, or the 
if one wishes to perform the 
] 
| 
| 


E )/2 i s well. sz 
Possible A comparisons, as well as other 


“ombinatig 


m 

omparisons, based upon various 

no ns of the K treatments. 

porter to solve the problem of confounded 
"sons of treatments from interaction 


es, i : ; n 
Solutic 1n a factorial analysis of variance, the 
or 


r ! proposed here bases the number of 
Catm 
D 


m E (K’), for the q statistic, upon the 
Ta e4 of unconfounded comparisons only. 
Du Shows the relationship between the 
Here M of unconfounded comparisons and K’. 

“40 be seen that when the number of 


Toy s. TABLE 3 
As , YUM "NP 

A Fy R OF K(K — 1)/2 PAIRED COMPARISONS 
m ney OF THE NUMBER or K TREA 


N 


umb. F 1 
BELLA Number of K(K — D/2 
comparisons 

^ i 3 

E 6 

6 10 

7 15 

8 21 

9 28 

10 36 

u 5 

12 55 

13 | 66 

Iq 78 

15 91 

16 105 

17 120 

1g | 136 

19 153 


171 
= | 190 
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TABLE 4 


NUMBER OF ADJUSTED A’ TREATM A8 A FUNCTION 
OF THE NUMBER OF UNCONFOUNDED PAIRED 
COMPARISONS A GIVEN 
INTERACTION TABLE" 


Number of unconfounded Number of adjusted 
comparisons K' treatments 

3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
99-112 15 
113-128 16 
129-144 17 
145-162 18 
163-180 19 
181-200 20 


s to inter- 
r presented at the 
ork, August 20, 1969, 


unconfounded comparisons is 3, the number 
of K' treatments is 3; when the number is 6, 
K’ = 4; when there are 190 unconfounded 
comparisons, K’ = 20. Table 4 is analogous 
to Table 3, except that Table 3 is based upon 
the total number of K(K — 1)/2 compari- 
sons, whereas Table 4 is based upon the num- 
ber of unconfounded paired comparisons only. 
Table 4 also includes the interpolated (to the 
nearest whole number) intermediate K' val- 
t is, between 3 and 4, 5 and 8, 9 and 
12..., 181-200 E'- 1)/2 compari- 
sons. Given this information, it is now pos- 
sible to apply the proposed solution to both 
the Tukey and Scheffé adjustments. The data 
are provided by Winer (1962) from the re- 
Itsofa2X 3 analysis of variance, presented 


ues, tha 


1 
helow in Tables 5 and 6. 
The Tukey A pplication 
t used here is described in Snedecor 
number of unconfounded paired 
contrasts is nine, as given by three within 
each of two rows (yielding six comparisons) 
plus one within each column (yielding an- 


other three comparisons). 


The tes 
(1956). The 
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TABLE 5 


RESULTS OF ANALYSIS OF VARIANCE 
(Winer, 1962, p. 234) 


Source of variation SS df MS F 
A 18.00 1 18.00 | 2.04 
B 48.00 2 | 24.00 | 2.72 
AB 144.00 2 | 72.00 | 8.15* 
Within cell 106.00 12 8.83 

Total 316.00 | 17 
| 
*p <.001, 


The number of adjusted K’ treatments, 
based upon nine unconfounded comparisons, is 
given in Table 4 as 5. The q.o; statistic-based 
upon five adjusted treatments and df = 12 in 
the error term from the results of the analysis 
of variance in Table 5—is 4.51 (given in 
Snedecor, 1956, Table 10.6.1, p. 252). The 
Standard error of the mean is given by Sz 
= V8.83/3 = 1.71. The smallest mean differ- 
ence required for statistical significance at the 
-05 level, for the unconfounded contrasts only, 
is given by multiplying q.s; by Sz that is, 
(4.51) (1.71) = 7.71, which is used for each of 
the nine unconfounded paired comparisons. 
Here asb; (with ë = 10) is statistically greater 
than asbs (with $ = 2), and asb (with z = 12) 
is statistically greater than ash» (with $ = 2), 
since both these mean differences exceed 7.71, 
required for significance at the .05 level. 


The Scheffé A pplication 


For the Scheffé test, the critical value 
of Fas is given by (K — 1) multiplied by 
Fig (K — 1, df). The number of adjusted or 
K’ treatments from Table 4 is 5, once again. 
Substituting in the formula above, we obtain 
TABLE 6 


Means or AX B INTERACTION 
(Winer, 1962, p. 235) 


a 
Factor 
Factor o 
Bi | Bs | Bs 
A 4 8 6 
Aud 10 " 12 
Se 
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(5—1) [Fos (4, 12)].= 4 (3.26) = 13.04. 
Since q = V2F (ratio), the corresponding .05 
statistic is V2 (13.04) = 5.11. The Sz is once 
again 1.71. Therefore, the mean a A 
quired for the .05 level is (5.11) (1.71) or e 
The only paired contrast that reaches sta “4 
tical significance, at the .05 level of MA 
is the ab; mean (value = 12), compared E 
asbə mean (value = 2), since 12 =a = " 
which is greater than the required 8.74 (a as 
used for the combination A’ treatment CO 
trasts as well). 

This study has attempted to extend b 
multiple-range tests developed by Tuke EC 
Scheffé to interaction tables in the ana 1a 
of variance, without disturbing the we 
principles upon which each test rests : sd 
applied to the one-way analysis of at" E 
Although the illustrations were based ber. 
factorial analysis with only one measure Py 
per subject, the adjustment can just as v é 
be applied to a repeated-measures desi aa 
viding one uses the appropriate error ý 
(between or within) to obtain the see 
error of the mean. Also, the example * each 
contained equal numbers of subjects 1! vari 
experimental condition. In analyses subject? 
ance employing unequal numbers of We 
the procedures outlined in this article pro" 
used in conjunction with the formulas 2 
vided by Kramer (1956). Thus, the aie 
ment described here has a complete realy 
applicability to the various possible & 
of variance designs. 
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TRADITIONAL VERSUS BEHAVIORAL 
PERSONALITY ASSESSMENT: 
A COMPARISON OF METHODOLOGICAL AND 
THEORETICAL ASSUMPTIONS 


MARVIN R. GOLDFRIED ? axp RONALD N. KENT 


State University of New York at Stony Brook 


The traditional and behavioral approaches to the prediction of human behavior 
are examined with respect to such underlying assumptions as the basic concep- 
tion of personality functioning, the selection of test items, and the interpretation 
of the responses to the test. Whereas traditional tests of personality involve 
the assessment of hypothesized personality constructs which, in turn, are used 
to predict overt behavior, the behavioral approach entails more of a direct 
sampling of the criterion behaviors themselves. In addition to requiring fewer 
inferences than traditional tests, behavioral assessment procedures are seen as 
being based on assumptions more amenable to direct empirical test and more 
consistent with empirical evidence. The available research findings on the com- 
parative predictive ability of the two approaches similarly suggest that the 
behavioral orientation is a potentially useful approach toward the construction 
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oe the many changes that have oc- 
n n clinical psychology in recent 
iy 1e very sharp increase in the popular- 
M behavior modification procedures rep- 
" aa One of the most exciting trends. As a 
tio, Vent approach based upon the applica- 
9i experimentally derived psychological 
" T to the clinical setting, behavior 
Cf encompasses a number of dif- 
u ru dune that have been found to be 
thay; g in alleviating a variety of different 
loral problems. 
be raise the successful implementation of 
tect; Ss modification techniques depends di- 
By ee an adequate assessment of the 
Bibles eosin need of change and those 
ap ack Maintaining these behaviors, this 
hewed | lo treatment has also stimulated a 
ftieq Interest in clinical assessment (Gold- 


~ Ë Pomeranz, 1968; Kanfer & Saslow, 
1 


Pring 


e d . 
(Sar Preparation of this paper was supported by 
Mte of Grant MH15044 from the National Insti- 
p The a otal Health, 
Go Eu are grateful to James H. Geer, Anita 
theip or Kathleen Kent, and P. Scott Lawrence 


ma 5 " H 
any helpful comments on an earlier draft 


this Papen 


ests — , 
rieg for reprints should be sent to Marvin R. 
Sity of y CPartment of Psychology, State Uni- 


Nay of Nee 
York 117 York at Stony Brook, Stony Brook, 
90, 


of assessment procedures that can more accurately predict human behavior. 


1969; Mischel, 1968; Peterson, 1968). 
Greenspoon and Gersten (1967) have noted 
the importance of clinical assessment for the 
effective application of behavior modification 
procedures, and argued for the possible utility 
of currently available tests of personality. 
Close examination reveals, however, that there 
are certain basic assumptions associated with 
traditional personality tests which make this 
approach to assessment less appropriate from 
a behavioral viewpoint. Just as the behavioral 
framework has been used to generate new 
therapeutic procedures, it would seem advan- 
tageous to use this orientation for the con- 
struction of new assessment techniques as well. 

Before proceeding further, we would like to 
note briefly what we mean by “traditional” 
and “behavioral” approaches to assessment. 
Although the ultimate goal of both procedures 
may be essentially the same (e.g., the predic- 
tion of human behavior), the general approach 
that has been employed in the pursuit of this 
goal has differed. The traditional approach to 
personality assessment has been directed pri- 
marily toward an understanding of the indi- 
vidual's underlying personality characteristics 
or traits as a means of predicting behavior. 
This general approach to assessment is re- 
flected in most of our currently available per- 
tests—both projective 


sonality techniques 
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(Rorschach, Thematic Apperception Test, 
Draw-A-Person, etc.) as well as objective 
personality inventories (Minnesota Multi- 
phasic Personality Inventory [MMPI], Cali- 
fornia Psychological Inventory, etc.). The 
behavioral approach to personality assess- 
ment, by contrast, involves more of a direct 
measurement of the individual's response to 
various life situations. The techniques associ- 
ated with behavioral assessment include the 
observation of individuals in naturalistic situ- 
ations, the creation of an experimental ana- 
logue of real-life situations via role playing, 
and the utilization of the individual’s self- 
reported responses to given situations. 

These two general approaches to person- 
ality assessment may be differentiated on a 
pragmatic level as well. In the case of the 
traditional procedures, the focus has been on 
the accuracy with which they might be used 
to predict behavior, with relatively little em- 
phasis being placed on their utility for select- 
ing therapeutic procedures (cf. Meehl, 1960). 
The therapeutic approach employed in any 
given case is usually more a function of the 
therapist’s particular orientation than it is of 
psychological test findings (Goldfried & 
Pomeranz, 1968; London, 1964). By con- 
trast, the interest in behavioral assessment 
has been generated by its utility for provid- 
ing the information essential to the selection 
and implementation of appropriate behavior 
modification procedures, Some of the possible 
Feasons underlying the differential clinical 
utility of the two approaches have been dis- 
Cussed in greater detail elsewhere (Goldfried 
& Pomeranz, 1968) and are not dealt with 
here, Instead, the primary point of compari- 
Son on which we focus involves those assump- 
s underlying each approach, as well as 

1e potential use each has for accurately pre- 
dicting human behavior, 

Although tr 


A aditional personality tests and 
behavioral te f 


tore tests share many of the same 
i odological assumptions (e.g. reliability 
of Scoring, , 


adequacy of standardization) the 


ia d gs of this article is to com- 
€ differing assumpti i i 
Seed g Mptions involved in 
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for raditional 
described and evaluated for both trad a 
e 1 
and behavioral approaches to -——— 
after which a comparison is drawn 


-edicti uman 
these two approaches for predicting h 
behavior. 


AL 

s ; THE TRADITIONS 

Assumptions INVOLVED IN THE TRA 
APPROACH TO ASSI 


Conception of Personality ality 

Most of our currently available personi 
tests are based on a common conceptus ue 
tion of human íunctioning, and have rele- 
directed toward obtaining information Ne 
vant to the underlying "personality struc spe- 
of the individual. Depending upon one’s 
cific theoretical orientation, these tives: 
characteristics may consist of d 
"needs," "drives," "defenses, s 
other similar psychodynamic constr uel ` pody 

For the most part, the various pm he 10° 
namic or trait theories are based ic a pe 
tion of psychic determinism, bett Y 
son's actions are assumed to be men ing 
certain underlying dynamics. ms 
this conception of personality, the uman P 
propriate way in which to predict e asses, 
havior should be based on a thera sie of 
ment of those inferred character? pe ? 
which the overt actions are believe‘ 


inferre» 


rv 
“traits, 


sonality functioning is the assu ait) 9^ 
consistencies in behavior de Mes Ü 
independent of situational variatio aed, : 
from several studies, however, s ai 
support this assumption. The effec! " 
stimulus conditions on behavior nd c 
demonstrated by Hartshorne a in wh 
(1928) classic study on honesty) | sie g 


3 ^ ; portu 
children were provided with oP! +O rt 
ersity HH 
ics): "ug 
hletic li 
at gener” we 
Li 


i 
ided that je P! 


cheat, lie, and steal in a div 
(e.g., home, party games, 
shorne and May found a la 
code of morals, and conclu‘ Pe 3 
progressively change the € wee! ao! 
gressively lower the correlations "> gler 


d ly, 
tests [p. 384].” More recente ny detec 


} 
0 
k 1 
function. ET. 
i iti option 0! Fi 
E e traditional concepH® . — ( 
Basic to the traditio ption E 


Hunt (1966, 1969) have sation? i 
strated the importance of sit snes it! 
in their S-R Inventory of Anxio “twee 
they found that the interactions 


e 
pet? o 
be zB to 
4 d mo 
ations and subjects contribute 
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total variance than did the variance associated 
with individual differences alone. On the basis 
or both questionnaire data and direct observa- 
tions, Moos (1969) similarly found a substan- 
tial Proportion of the variance resulting from 
ubject x Setting interactions. Although Mis- 
chel and Ebbesen (1970) have reported indi- 
vidual differences among children in the abil- 
lY to delay gratification, it was possible to 
mustantially modiiy the period of delay by 
Mischa vanauons in the particular situation. 
other e (1968) has reviewed a number of 
qM oe dealing with this issue, the re- 
Eie: Which have failed to confirm the con- 
take on of human functioning which does not 
E. Into account the importance of environ- 
ntal influences on behavior. 


Selection of Test Items 


a important consequence of the view that 
ene in behavior exist independent of 
ue Variables has been the fact that 
pectin little emphasis has been placed on 
ctn B the procedures to be used in se- 
tional the pool of stimulus items for tradi- 
tests of personality (Loevinger, 1957). 

Soy, 8a test constructors often employ rig- 
Set. of selection procedures to obtain a final 
Original 915 the procedures for defining the 
T Pool are rarely discussed. This origi- 
Eos pool is obviously not determined on 
t m basis, but is presumably related to 
test constructor's assumptions regarding 


Co; 


T 


qu Nature of the personality variables in 
testion, a 
Devinger (1957) and Jessor and Ham- 


ined (1987) have criticized this poorly de- 
items Approach to selecting the initial set of 
3peci dem Suggested instead the use of a 
Buide in reoretical orientation as an alternate 
Petsonal; Selecting the item pool. In using 
Assume ity theory to select test items, one 
" hue the theory in question has some 
Ciateg ve Although certain constructs asso- 
have with specific theoretical orientations 
‘logs received some research support, there 
theo Re appear to be any one personality 

Which, as yet, has achieved adequate 
empirical confirmation to justify such 


NIS 
h 

bp e * 
197p Proach to item selection (cf. Pervin, 
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Interpretation oj Test Responses 


While variations in the stimulus aspects of 
the test situation have been considered to be 
relatively unimportant, the interpretation of 
responses has been the subject of extensive 
and detailed consideration. The interpretive 
significance of any particular test response 
may be determined by either intuitive or em- 
pirical means (Hase & Goldberg, 1967; Loev- 
inger, 1957). The intuitive approach may be 
based on an informal rationale with few ex- 
plicit theoretical assumptions, or may involve 
more formal deductions from theory. In using 
the empirical approach, on the other hand, 
the interpretive significance of test responses 
is derived solely from the empirically estab- 
lished relationship between test and external 
criteria. 8 

Even in those instances where the intuitive 
approach is used initially to specify the in- 
terpretive significance of various signs, em- 
pirical checks on the accuracy of those in- 
terpretations are clearly needed. However, 
logical or theoretical assumptions underlving 
the meaning of test signs are often so firmly 
held that failure to obtain empirical confirma- 
tion often has little effect on changing clini- 
cians’ interpretations. Thus, despite evidence 
indicating that Hutt and Briskin's (1960) 
proposed interpretation of various signs lacks 
empirical confirmation (Goldfried & Ingling, 
1964), the revised edition of the manual 
(Hutt, 1968) continues to recommend the use 
of invalid interpretations. 

The tendency for clinicians to retain cer- 
tain interpretive hypotheses about test scores 
is directly illustrated in a study by Chapman 
and Chapman (1969), where they asked ex- 
perienced psychodiagnosticians to determine 
which of several Rorschach signs reflected 
male homosexuality. The signs presented to 
the clinicians were selected from Wheeler’s 
(1949) initial list of 20 possible Rorschach 
indicators of homosexuality, only some of 
which have held up under empirical test, The 
results indicated that clinicians tended to 
select those signs which they believed to be 
most indicative of male homosexuality on a 
rational-intuitive basis (e.g., “buttocks” )— 
despite the fact that research findings failed 
to confirm the empirical validity of these 


2 
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signs—and almost never selected those in- 
dices that were, in fact, empirically valid. As 
noted by the authors, these “illusory correla- 
tions” between sign and inferred character- 
istics are likely to be a source of considerable 
error in predicting criterion behaviors. 
Although such illusory correlations are only 
a problem for intuitively derived tests, other 
potential sources of error exist with empiri- 
cally derived measures. For example, despite 
the fact that the clinical scales on the MMPI 
were originally derived to assist in diagnostic 
classification, the test is currently being used 
for more complex decisions (Dahlstrom & 
Welsh, 1960; Little & Shneidman, 1954; 
Rychlak, 1968). Rather than simply using an 
MMPI protocol to determine diagnostic cate- 
gory, the clinician more typically carries out 
a profile analysis, in which both the absolute 
and relative scores on the various scales are 
used to construct a personality description. 
Any such interpretation must assume a high 
correspondence between the diagnostic cate- 
gories associated with the scales and the in- 
ferred personality traits. Such an assumption, 
however, has failed to receive empirical sup- 
port, in that considerable overlap in behav- 
ioral characteristics has been found to exist 
among the various diagnostic classifications 
(Mischel, 1968; Zigler & Phillips, 1961). 
Another assumption basic to the interpre- 
tation of test responses is that the protocol 
provides a sufficient sampling of the indi- 
vidual’s personality characteristics (MacFar- 
lane & Tuddenham, 1951: Murstein, 1961). 
In the case of projective techniques, Mur- 
Stein acknowledged that since this “sampling” 
is determined by the subject’ 
the test situati 


adequacy of the 


ually nonexistent, with the ex- 
Loevinger’s (195 


theory sery. 7) suggestion that 
e I: 
struction, as the background for test con- 
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individual “has,” the behavioral view of hu- 
man functioning places greater emphasis E 
what a person “does” in various ped 
(Mischel, 1968). That is, rather € 
pothesizing certain underlying 7 cons ". 
(e.g., "instincts" or "needs") which are E 
lieved to function as motivational -— i 
nants of behavior, the basic unit for cons a 
eration involves the individual's e. 
specific aspects of his environment. E. 
behavior is viewed as being determined 


, Th NP rning 

only by the person's previous social ws. 
istory j rT "ironr 

story also by current envi A 

history but y iy 


antecedents and/or consequences 0 

havior in question. 
The behavioral orient 

well represented by Wallace’s 


ation to personality X 
(1966, 1967) 
used the COP 
to 


individual's behavioral repertoire = à 
which is determined primarily DY 
learning experiences. This closely ! 


; : j ne SI 
what is typically referred to when er ity 


r learnt 


»aral e 
peak 


vidual will actually perform in 
however, will depend on the exten Vor 
certain situational factors elicit a itis 
force this particular response. From thls 
of view, then, personality may b 

as an intervening variable that 
according to the likelihood of an 
manifesting certain behavioral te! 
the variety of situations that C9 
day-to-day living. . 

As noted in conjunction with € 
of the traditional conception O 
the available research evidence in 
indicate that the likelihood of me 
responding in a certain way d 
on his own response capability (cf 
the nature of the situation as WC rne © 
& Hunt, 1966, 1969; Hartshorn. 
1928; Mischel & Ebbesen: pi 
1969). After reviewing the aui 
erature regarding the consisten s 
ality variables, Mischel (1968) 
that “behaviors which are we a 
stable personality trait indica j de 
highly specific and depend 9! respon” 
the evoking situations and we 3 r 
employed to measure them [P- 


indiv! in 
ndencl® pis 
mpris 
sur discus, 

pers 
does: 


Selection of Test Items 


Consistent with a conception of personality 
Which emphasizes the individual's specific re- 
Sponse to specific situations, a crucial assump- 
tion of behavioral tests is that stimulus situ- 
ations are adequately represented. Adequate 
representation of situations requires not only 
Careful simulation during the measurement 
Process (e.g, movies, slides, written descrip- 
tions) but also rigorous definition of the 
Appropriate pool of situations. For example, in 
Surveying fear behavior (Geer, 1965; Wolpe 

ang, 1964), it is necessary to obtain mea- 
Sures of fear in situations which sample, in a 
representative manner, the population of po- 
tentially anxiety-producing situations. In se- 
&cting the stimulus items, then, the concept 
of content validity, as it has been traditionally 
®Dplied to proficiency tests, becomes highly 


relevant for behavioral assessment. 
Goldfried and D'Zurilla (1969) have fol- 


paved this line of thinking and developed a 
igchavioral analytic” method for test construc- 
n. For example, Goldfried and D’Zurilla 
«he applied the behavioral analytic model in 
k dw of college freshman effectiveness by 
enti ing those responses which are differ- 
situate, rewarded by significant individuals in 
lons which define the college environ- 
E The initial step in selecting a pool of 
E tems consisted of a “situational analy- 
aa order to obtain a sample of prob- 
lC situations that were likely to be 
entered by freshmen during their first se- 
lish, 1 The situational analy 5 was accomp- 
Droh by means of written daily records of 


Men lematic situations obtained from fresh- 
With Selves, as well as through Interviews 


28 4 i 
vith aff Members having frequent contact 
Uds &shmen, 


r 


Seng. Bg fro The large pool of situations 
pal to a M these procedures was then pre- 

ate Conq FX sample of freshmen during 
tye, Which ester, who were asked to indi- 
et 9f these situations they had ever 
i Only those instances that had a 
ite, hood or 


; retained in 
lie po occurrence were retair 
o 
Suc ol. T 


d he following is an example of 
Sttuation (Goldfried & D'Zurilla, 


co T ; 5 
E ather Y. Mposition is due in your English class 

ag Tie, nas assigned a week before and is on 
Pic, which you really don't under- 
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On Wednesday afternoon when you sit down to 
work, you find that you have absolutely no idea 
about what to include in the paper. You realize, 
however, that you must start writing the composi- 
tion soon in order to have it in on time. 


This approach to the selection of an item 
pool carries with it the assumption that the 
informants have provided an accurate account 
of the problematic situations associated with 
the college environment, as well as the as- 
sumption that the particular situations se- 
lected will continue to have a high probabil- 
ity of occurrence over the period of time 
during which the assessment is to take place. 
These assumptions are capable of direct em- 
pirical confirmation by utilizing a different 
group of informants, and by surveying the 
frequency of occurrence of these situations at 
a later point in time. 


Interpretation of Test Responses 


In discussing the assumptions underlying 
the interpretation of behavioral tests, we may 
note the basic distinction drawn by Good- 
enough (1949) between the "sign" and the 
“sample” approaches to the interpretation of 
test responses. The sign approach assumes 
that the response may best be construed as an 
indirect manifestation of some underlying 
personality characteristic. The sample ap- 
proach, on the other hand, assumes that the 
test behavior constitutes a subset of the ac- 
tual behaviors of interest. Whereas tradi- 
tional personality tests have typically taken 
the sign approach to interpretation, behav- 
ioral procedures approach test interpretation 
with the sample orientation. 

The assumption that behavioral test re- 
sponses constitute a sample of certain response 
tendencies is closely tied to the er Pp 
that the test items themselves consis of a 

vesentative sample of situations relevant 
SO chants of interest. In the assessment 
i pieces toward authority, for exam- 
pling interpretation of the indi- 
t responses would rest on the as- 
the test items represent an 
le of interpersonal situations 


of 
ple, a sam 
vidual's tes 
sumption that 
np 
adequate samp'e ; 
i volving authority figures. . 
T xother issue related to the interpretation 
> n behavior as a sample of criterion be- 
[s = is that of the method employed in 
havior 1S 


lowing for the expression of the response. 
allows 


i 
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The ideal approach to response expression 
would constitute the individuals actual re- 
sponse in a real-life situation, in that this 
represents the most direct approach to be- 
havioral sampling. Within certain settings, 
such as hospitals and classrooms, this ap- 
proach can be readily implemented. Unless 
these observations are unobtrusive, the as- 
sumption that the behavior remains unaffected 
by the assessment procedures may turn out to 
be faulty. For example, a study by Patterson 
and Harris ? suggests that such considerations 
as deciding to have an outside observer, 
rather than the mother herself, make observa- 
tions of her child's behavior in the home may 
produce differences in the data obtained. 
Moos (1968) similarly found a tendency for 
hospitalized patients to respond differently 
When they were aware that their behavior was 
being observed. 

In instances where unobtrusive direct ob- 
servation procedures are not feasible, other 
approaches to response expression must be 
employed. A potentially useful alternative 
consists of role-playing procedures, where the 
individual is required to act out his response 
as if he were actually in the situation in 
question. The available research on the use of 
role playing for assessment offers some sup- 
port for the assumption that the role-played 
response parallels the behavior in the real-life 
setting. For example, in an attempt to predict 
competent behavior under conditions of inter- 
personal stress, Stanton and Litwak (1955) 
found that subjects" responses to role-playing 
situations involving such stress correlated .82 
with independent ratings by informants who 
were familiar with the subjects’ behavior in 
this type of situation, Further support for 
the value of role playing as an assessment 
Procedure comes from the work of Weiss 
(1968) on the prediction of “reinforcing skill” 
in interpersonal Situations. This assessment 
technique consists of placing the subject in a 
role-playing situation where he is asked to 
listen to a speaker and do whatever he can to 
maintain rapport” with this individual. The 


À G R. Patterson & A. Harris. 
ee Considerations for observation procedures, 
aper presented at the meeting of the American 


Ioue logical Association, San Francisco, September 


Some methodo- 
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frequency of the subject’s reinforcing behav- 
iors as recorded in the role-playing assess 


ment has been found to be predictive of peer 
ratings for social behavior (e.g... “fun a 
party"), and of independent ratings of t T 
competence and reinforcing skill among 
therapists. il 
stil another approach in assessing an ina 
vidual's response is through self-reports " 
behavior. The assumption associated with t É 
use of self-report is that the individual a 
accurately observe and communicate his re i 
tion to certain situations. The most € 
use of self-report procedures has been in E A 
junction with the assessment of anxiety. In 5 
use of fear survey schedules, for exami 
list of potentially frightening objects a 
blood, spiders) and situations (e.g the 
criticized, being alone) is presented 10 iei 
subject, who must indicate the extent to W 
he subjectively finds each of these dicil 
frightening. Findings by Geer (1965) indi 
that subjects' responses to specific items 
the fear survey schedule (ie. dogs, “ind 
were predictive (e.g., rho correlation ruik 
from .52 to .92) of their fear reaction W f 
placed in an actual situation where they suc 
to approach these objects. Paul's (1966) 


ous 


king 


ats 


cessful use of the S-R Inventory of AN3! 
ness to predict anxiety in a public pem 
situation (r = .50 and .72 for two er 
samples) similarly adds credibility to = 
sumption that verbal self-prediction d fe 
good correspondence to behavior in ! 
situations. lay. 
The use of direct observation, nor 4 
or self-report procedures for piani the 
sessment is based on the assumption ©” jitt? 
particular method selected contribute jet: 
variance to the responses that are Sits ob 
Although it is possible that the un met 
tained by using each of these rin ja 5 
ods will differ, there currently exists 7 E. 
search evidence bearing on this ques vage? 
ing Campbell and Fiske's (1959) nos, DON 
approach to studying method ee £ K. 
ever, the degree of correspondence are s 
Sponses obtained through these pe ten 
cedures can readily be subjected to Tu 
pirical test. "m Ü 
In addition to the question of the i pro 

method employed in the assessme 
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dures, there is also the issue of how to cate- 
8orize or score this response. Rather than at- 
tempting to develop scoring keys based on 
either intuitive or empirical grounds, the be- 
havioral approach to assessment utilizes in- 
formation about the actual criterion behav- 
‘ors—that is, those behaviors which are the 
target of the prediction. In the case of the 

Inventory of Anxiousness (Endler & 

Unt, 1966, 1969), for example, 14 “modes 
of Tesponse” were selected to assess the indi- 
Vidual’s “anxiety” reaction to any given situ- 
ation, These several modes of response (e.g., 
fart beating faster, getting an uneasy feeling, 
Wanting to avoid a situation, perspiring) 
Were chosen to represent the multidimensional 
Sharacter that has been found to typify the 
actual fear response (cf. Lang, 1968). 

In their description of the behavioral ana- 
Jtic approach to assessing effectiveness, Gold- 
Ned and D'Zurilla (1969) have suggested 
Suidelines for conducting a criterion analysis 
i be used in establishing standards for scor- 
ng behavioral measures, This behaviorally 
ered criterion analysis consists of (a) a 
“ational analysis, in which the relevant en- 
‘ronmental events are sampled; (5) a re- 

onse enumeration, where a pool of responses 
E aum Of these several situations is ob- 
entail > and (c) the response evaluation, which 

S the categorization of each potential 
E of action in a situation according to its 
52 "6 Of effectiveness. So as to parallel the 
Dua effectiveness of these several behav- 
that d the real-life setting, it is recommended 
by 4,6 Judgments of effectiveness be made 
Who Significant others”—that is, d 
Who are respected by those people toward 
Wh the assessment is being directed, and 
ing opp the role of labeling behavior as be- 
iroa e or ineffective in that particular 

T e xt Š 

“asic assumption underlying this ap- 
vie lishing scoring criteria is that 

common standards or behav- 

life eor effectiveness within the pat- 
Ere ues in question, and that Meis 
. © relatively stable over the period 
qu In p Which the assessment is to take 

Sog M ag IC Of the rapidly changing value 

SOciateq with y ts of our 
> this is wi n many aspects b 

“sumption may prove to be 
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faulty in the assessment of certain behaviors 
(e.g, students’ interactions with authority). 
It should be emphasized, however, that failure 
to confirm empirically the existence of a 
stable set of behavioral norms would have 
implications not only for the establishment of 
scoring criteria but also for the selection of 
criterion behaviors against which any valida- 
tion could take place. However, this would 
pose a problem for any attempt at predicting 
human behavior, whether it be behavioral or 
traditional. 


CoMPARISON OF TRADITIONAL AND 
BEHAVIORAL ASSESSMENT 


At this point, a more direct comparison 
may be drawn between traditional and behav- 
ioral approaches to assessment, taking into 
account the levels of inference associated with 
each, and the available data on comparative 
predictive ability. 


Comparative Levels of Inference 


In order to compare more systematically 
the nature of the assumptions involved in tra- 
ditional and behavioral assessment, the infer- 
ences associated with the prediction of be- 
havior from each of these approaches have 
been depicted graphically in Figure 1, The 
arrows in the figure pointing upward refer to 
those assessment inferences or inductions 
ciated with the interpretation of the test 
scores themselves, whereas the arrows point- 
ing downward reflect validation inferences— 
that is, deductions from the interpretation of 
the test which are associated with the predic- 
tion of the criterion, The three levels of in- 
ference associated with both the induction and 
deduction include: (a) Those inferences that 
allow one to conclude that the recorded ob- 
servation accurately reflects the occurrence of 
some specific event, namely, the “true” test 
response, OF the “true” criterion behavior, 
This is clearly the most basic of all inferences 
and carries with it such assumptions as the 
lity with which the test response and 

TAg hor is recorded and scored, as 

‘iterion behavior 1s l 3 ? 
EH » absence of variance which might 
well "a a deo the specific method employed 
be attributed to the spe ie ‘a à 

accinge the event in question (cf. Camp 
pacc 1959). (b) The second level of 
c which one concludes that the 


reliabi 


in 
bell & 
inference, 
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INFERENCES ASSOCIATED WITH 
BEHAVIORAL TESTS 


TRADITIONAL TESTS 


LEVELS OF INFERENCE 


Personality 
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Construct e 
Theoretical assumptions ! \ 
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> Test Criterion P A behaviors: El 
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Fic, 


event being measured is a sample of some 
larger population, is based on the assumption 
that the responses obtained by the test—or 
the criterion behaviors selected for observa- 
tion—are representative of the relevant as- 
pects of the entire population of true responses 
(or criterion behaviors) in question. (c) Fol- 
lowing the assumption that the population of 
Tesponses has been adequately sampled, it 
may additionally be inferred that the test 
performance has been indicative of some un- 
observable construct, and in turn, that this 
Construct may be reflected by certain criterion 
behaviors. This third level of inference re- 
quires theoretical assumptions that describe 
the relation between the construct in ques- 
tion, and both the Population of responses 
and the population of criterion behaviors. 
. Using this Schematic outline of levels of 
inference, we may compare the prediction 
Process from two approaches to assessment. 
Eus eet consider the prediction of 
first from is tog n ecu Pl 
it. from 4 ational point of view and 
5 havioral approach, 
e relevan, € ES would be that 
€sponses as observed (e.g., 


1. Levels of inference in traditional and behavioral tests. 


^ n t on 
presence of inanimate pe on the 
: Ee i 5 
Rorschach or high social introvers sc 


MMPI) are only negligibly affected with 
ing error or any artifacts asocia, ae 
measurement process itself. These onstitul’ ; 
responses, in turn, are assumed to we eetict 
an adequate sample of some hYP s est 
population of potential anxiety-re'* gig 


t 


s cn TN ninee |^ 
responses which the particular ape ai E 
give if, for example, the measure Y [t 
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same-sex peer relations. be " 
lected would vary from theory racterist 
might include such inferred chara un enl 


homosexual tendencies, incon 
self- and ideal-concept, or. m sis in 
hypothesized variables. In using pavior, 
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relate the inferred personality n i 
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Same-sex peers, Depending upon the particu- 
lar theory of personality employed at this 
stage of the prediction process, the behavioral 
definition is apt to vary. From this hypo- 
thetical population of behaviors, specific be- 
havioral interactions are then sampled (e.g., 
Conversational ability in specific social situ- 
ations), and procedures for measuring these 
Criterion behaviors are selected (e.g.. direct 
Observation, peer ratings). 

À behavioral test to predict anxiety in 
Same-sex peer relationships, by contrast, 
Would consist of placing the subject in a 
Tepresentative sample of situations requiring 
Peer contact and eliciting his response. Inas- 
Much as this procedure might take any one 
9! several forms (e.g., direct observations in 
actual situations, role-playing, self-report), it 
aimed that the individual's “true” re- 

ase contributes more variance than does 
€ particular method selected. The next level 

Mference is based on the assumption that 

© elicited responses have provided an ade- 
Bate Sample of potential criterion behaviors 

ich define anxiety in such relationships. 
Contrast with the traditional approaches 
men sonality assessment, behavioral asse 
Views the responses given as consisting 
one of the criterion behaviors theme 
irectly nae these behaviors are cert 

TAE m an empirically defined criterion 
i A (e.g., sampling situations involving 
the ans with peers), thereby eliminating 
tive ced for either the inductive or deduc- 

e E of theoretical assumptions. o 

id ia behaviors to be used for the 

n have been specified, essentially the 
tragi PSY chometric procedures employed sd 
Useq ne personality measures would | E 
7" testing for criterion-related validity. 
roaches "P barison between the two ap” 
Simplif to prediction, in fact, is over- 
ticia, 6d. Rarely would a traditional diagnos- 

tet to a criterion measure, such 
Say cty in relations with members of the 


m s 
esp iy On the basis of one type of test 
ality Ao and the inference from one person- 
Dong. Struct. Rather, several types of re- 


ry r i 3 
Pe wel inferred, "Then, as a. further infer- 
9'ving additional theoretical assump- 
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tions, these constructs would be interrelated 
to provide a dynamic picture of the indi- 
vidual's personality functioning. This attempt 
to develop a “theory of a person" (cf. 
Sundberg & Tyler, 1962) involves inferences 
most removed from the data and most 
dependent upon the clinician’s theoretical 
orientation and clinical experience. 

Even in the more simplified form in which 
we have presented it, the comparison under- 
lines a crucial difference in the construction 
of traditional and behavioral tests. To 
recapitulate, traditional personality tests 
have developed methods for eliciting behav- 
iors that may serve as signs of underlying 
personality variables. The responses elicited 
by the tests serve as a basis for theo- 
retical inferences regarding basic person- 
ality functioning, which is then related to 
criterion measures. By contrast, behavioral 
tests, such as those following the behavioral 
analytic model (Goldfried & D'Zurilla, 1969) 
may be developed by working backward from 
criterion measures. That is, a sampling of the 
criterion situation and behaviors is obtained 
first, after which an attempt is made to 


develop efficient measurement procedures 
for assessing these behavior-environment 
interactions. 


Comparative Predictive Ability 

It is interesting to note that because of its 
greater predictive potential, there has been a 
general leaning over the years toward more 
direct, criterion-related measures within the 
realm of traditional personality testing. In 
conjunction with a review of the prognostic 
use of various personality measures, for ex- 
ample, Fulkerson and Barry (1961) have 
concluded that of all the possible ways to 
predict behavior, the most accurate predictor 
remains the individual’s previous behavior in 
similar situations. 

After reviewing the research on the TAT, 
Murstein (1963) concluded that certain as- 
pects of the subject’s response to the testing 
situation itself may prove to be potentially 
useful in predicting overt behavior in the cri- 
terion situation. As he noted, however, these 
responses 
are not really part of the stories, but represent 


miniature bits of overt behavior similar to the 
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criterion overt behavior. It is small wonder, there- 
fore, that caustic comments in the telling of a TAT 
story are related to overt aggression [pp. 318-319]. 


In a similar vein, Kagan (1956), rather than 
utilizing theoretical constructs to improve the 
relationship between TAT scores and overt 
aggression, was successful in improving pre- 
dictive ability by using stimulus materials 
that more accurately sampled the actual cri- 
terion situation (rather than the typical am- 
biguous stimuli), and scored the protocols for 
only those aspects of aggression (e.g., tend- 
ency to fight) that were of interest in the 
criterion situation, 

In the case of the Rorschach, the procedure 
that most closely approximates scoring and 
interpreting test responses as a sample of 
overt behavior has been developed by Fried- 
man (1953). Using Werner’s (1948) descrip- 
tion of perceptual and cognitive development, 
Friedman has described a procedure for using 
each response as a sample of perceptual 
behavior. In comparison to many of the 
other approaches to scoring the Rorschach, 
Friedman’s approach stays much closer to the 
data and involves considerably fewer assump- 
tions and—perhaps as a result of this—has 
consistently resulted in favorable empirical 
validation (Goldfried, Stricker, & Weiner, 
1971). 

In addition to these indirect findings 
regarding the predictive potential of behav- 
ioral assessment, there have been some studies 
in which a direct comparison was made be- 
tween the predictive ability of traditional and 
behavioral assessment procedures (e.g., Car- 
roll, 1952; Hase & Goldberg, 1967; Paul, 
1966: Wallace & Sechrest, 1963). 

In conjunction with a larger study on the 
effectiveness of Systematic desensitization as 
& procedure for reducing anxiety in a public 


Speaking Situation, Paul (1966) administered 
& variety of 


tr i 
eatment for Personality and 


[IPAT| Anxiety Scale, 
ty Scale, Bendig's Extra- 
Scale, Anxiety Dif- 
SR Inventory of Anxious- 


riteri j P 
Ce em Mad on the subjects 


a r - OF confidence when in a ub- 
Speaking Situation, Paul found that d 
correlations obtained P 


with the S.R Inven- 
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tory of Anxiousness item involving public 
speaking exceeded by far the su 
found with the more traditional tests. p: " 
with data from two separate samples, the 
average correlation obtained between criteri 
and the S-R Inventory was .61, in compart 
son to correlations of .15 for IPAT anxiety, 
of —.24 for extraversion-introversion, in 
for emotionality, and .29 for the Anxie! 
ifferential. 

das and Sechrest (1963) conducted d 
study in which they compared the prec er 
accuracy of several projective techniques M 
the subjects! own selí-ratings in the iT 
ment of achievement, hostility, somatic 
cern, and religious concern. The rasis ne 
study indicated that whereas the corre " th 
between criterion (peer ratings on each ime 
variables) and self-ratings on these f tions 
sions averaged .57, the average corre : aci 
for the other tests were .05 for the Rotse 
08 for the TAT, and .14 for RO 
Incomplete Sentences Blank. 

The results of a study reported A 
(1952) similarly found simple sel ones d 
yielded higher correlations than did sc ri 
the Guilford-Martin Personnel Invento aj, 
an analysis of method variance v!4 es anc 
trait-method correlation matrix, E rd find- 
Fiske (1959) have noted that Carro je 
ings also suggest that the self-ratings ee 
to be less confounded by method iE i 

Using the item pool of the 
Psychological Inventory, Hase and 
(1967) compared the predictive Ea | 
scales constructed by various nd) wit 
theoretical, empirical, factor-an@ of orion ? 
subjects’ self-ratings. Against the € nance 
peer ratings on such variables as er 
sociability, responsibility, and A po 
characteristics, Hase and Goldberg 
that *in almost every case, the sub) an 
ratings were more predictive - - * 
of the scales [p. 245]." Even 
regression equations were use 
on the optimal combination 
prediction proved to be more a est in Perg 

Inasmuch as the growing € P ae 
ioral approaches to assessmer. ave abi 
phenomenon, relatively few stu .odictiV€ ph? 
carried out to compare their ie res: 
ity with more traditional proc 


by Carroll 
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data that are available, however, 


limited 
would seem to favor behavioral assessment. 


! 
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CONCLUSIONS 


N One of the basic characteristics of behav- 
loral assessment is the attempt to maximize 
the similarity between test response and cri- 
terion measure. The desirability of approach- 
Ing the prediction process in such a way so 
as to reduce the number of inferences has 
been argued by Cronbach (1956), who has 
observed: 


Assessment encounters trouble because it involves 
hazardous inferences. Very little inference is in- 
Volved when a test is a sample of the criterion or 
When an empirical key is developed. Simple test 
interpretations involve inference from test to con- 
Struct to behavioral prediction. But assessors attempt 
a maximum inference from tests. As current writers 
escribe the process . . . personality theory is ap- 
Dlied to weave nomothetic constructs into a con- 
Struct of the individual's personality structure; pre- 
Ictions are then derived by inferring how that 
Structure will interact with the known or guessed 
Properties of the situation. Assessors have been 
foolhardy to venture predictions of behavior in 
inanalyzed situations, using tests whose. construct 
whi etations are dubious and personality theory 

ch has more gaps than solid matter [pp. 173-174]. 


_ While the successfully validated test pro- 
Phe Support for the numerous assumptions 
Wolved in predicting criterion behaviors 
“an test behavior, the unsuccessful effort at 
«dation provides no clue to the weak link 
a the inference process. The analysis pre- 
nted in this article suggests that any break- 
Gian of the predictive powers of a test may 
inp Uctively be viewed as due to one or more 
os based on faulty assumptions con- 
Ning the measurement process and the 
nomena of interest. 
Some the case of traditional personality tests, 
Such 9f the basic underlying assumptions— 
encis = the existence of behavioral consist- 
etal a wide variety of stimulus situa- 
Mpiric ave been found to be unsupported by 
as al evidence. Other assumptions, such 
Ypothe: involving theoretical relationships 
Structs sized between test responses and con- 
defined are often vaguely stated or poorly 
rece and consequently are difficult to test 
fans of he assessment of personality by 
Nore € Of behavioral tests, by contrast, !5 
®nsistent with the findings that human 
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functioning is due to both the individual's be- 
havioral repertoire and the demands of the 
specific stimulus situation. Further, relatively 
fewer assumptions are associated with this 
approach to test construction, and those that 
are involved can more readily be subjected 
to direct experimental investigation. By al- 
lowing for a more systematic elimination of 
erroneous inferences when validity coefficients 
are unsatisfactory, the behavioral approach 
to personality assessment would appear to 
have greater potential for the development 
of procedures that may enhance our ability 
to predict human behavior. 
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studied, suggests a frame of reference by which one could organize psychotherapy 


C PROCESS ANALYSES TO THE 
ERAPEUTIC PROCESSES! 
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This article briefly describes the way in which methods currently used in psycho- 


data in process terms, and presents Mar! 
approach permitting a more direct stud 


An attempt is made to describe the Markov analy 
stood by readers not acquainted with stochastic sta 


, Process studies in psychotherapy have tra- 
ditionally: attempted to study ongoing patient 
and therapist behavior throughout the course 
of therapy. The term process studies has been 
Used to differentiate studies that focus upon 
the nature of the therapeutic encounter from 
Outcome studies, which focus primarily on the 
effects of psychotherapy (Strupp & Luborsky, 
1962, p. 309). The term process study has 

Cen used in reference to a wide variety of em- 
Pirical investigations, and the notion of process 
Seems to have been used in these studies more 
asa Metaphor than as a construct with clear 
9perational definitions. 

, Process is defined in Webster’s (1951) dic- 
‘onary as “(1) A series of actions or operations 
Conducing to an end." It would follow from 
FM above definition that an adequate oper- 
4 nal definition of any process would have to 
p" into account (a) a molecular description 
Sine fication of the actions or operations 

E studied (i.e., the basic units of the process 
ed sentences, topics, affective expres- 
S, etc.); (b) a molar description of the se- 
ential relationship between these units (e.g, 
: Y characteristic ordering of the units within 

Sequence, how predictable the ordering is, 

E 80700. 
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kov chain analyses as a new methodological 
v of the actual processes in psychotherapy. 
es in terms that would be under- 
istics. 


whether it has cyclic properties, whether it 
always leads to the same end point or whethi 
there are branching points within it); and | 
the designation of the goal(s) or end point(s) 
toward which it moves. 

The author would contend that with the 
exception of Raush (1965)? none of the psy- 
chotherapy process studies in the literature 
(Auld & Murray, 1955; Marsden, 1971) has 
emploved an experimental design that would 
permit an adequate examination of psycho- 
therapy as a process. The following is a critique 
of the experimental designs most commonly 
used. 


CRITIQUE OF THREE Basic DESIGNS USED IN 
PSYCHOTHERAPY PROCESS RESEARCH 

As one obtains an overview of the literature 
concerning psychotherapy it becomes apparent 
that one of three basic designs is almost always 
employed: (a) using pre- and posttherapy 
tests; (b) counting the occurrence of various 
categories of behavior within therapy; and 
(c) obtaining the contingencies between various 
categories of behavior. 

The first design treats therapy as an “experi- 
mental effect.” Patients are tested before and 
after therapy, hence, the therapy “process” 
under investigation is operationally defined 
by pre- and posttest differences on some set of 
variables. If there has been a change in test 
scores, the existence of some process is inferred 


? Raush (1965) compared sequences of interaction 
between hyperaggressive and normal boys. He opera- 
tionally defined the | cginning and end of his Sequences, 
specified the categories of behavior occurring during 
those sequences, and contrasted the two groups in terms 
of differences in the serial ordering of the categorized 
behaviors. i 
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(e.g., Braaten, 1961; Lanfield, Stern, & Fjeld, 
1961; Seeman, 1954).! The salient weakness 
of the above type of study is that while a 
change in test scores may suggest that some- 
thing must have happened, whatever it was 
that did happen (presumably some “therapeu- 
tic process”) is neither demonstrated nor 
explicated and thus remains empirically un- 
verified. Further, the psychotherapeutic “pro- 
cess” is treated as one large unit (from pretest 
to posttest) rather than an unfolding series 
of smaller units moving toward some end. 

The second'design involves the classification 
of what actually went on into categories that 
are usually given labels denoting intention, 
action, or small set of actions. "Therapist 
behavior is classified in terms of active- 
passive, reflective-interpretive, warm-cold, 
friendly-hostile, etc.; the patient behavior is 
classified in terms of rigid-flexible, resistance- 
exploration, etc.; sometimes the moods, affects, 
and personages are coded (Colby, 1960; Leary 
& Harvey, 1956; Murray, 1956; Rogers, 1959; 
Snyder, 1963; Strupp, 1957). Therapy in this 
design is seen as a universe of patient- and/or 
therapist-initiated events, and “process” is 
Operationally defined in terms of the fre- 
quency (or relative frequency) of occurrence of 
those events (e.g., Boomer & Goodrich, 1961; 
Eldred, Hamburg, Inwood, Salzman, Meyers- 
burg, & Goodrich, 1954; Goldman-Eisler, 1954; 
Hunt, 1950; Tomlinson & Hart, 1962). One 
major variation of this design consists of 
dividing therapy into units of time (sessions, 
months, fractions, etc.), counting the events 
that occurred within cach unit, and comparing 
frequency distributions of events by units (e.g., 
Cofer & Chance, 1950; Dollard & Mowrer, 
2 Murray, 1954; Saslow, Matarazzo, 

ps, & Matarazzo, 1957). Process here is 
Eun D tug i change in frequency. of 
pei mdr n iste sampled. Both types 
E E Mise are severely limited in 
ating the presenc 

ever knows the ordering 
nit (i.e., the process 
to the next); one 
with which the 


intended as ex- 
g discussed. These 
istive list of studies 
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events occurred. It is logically possible, fo 


example, that two time units could cue E. 
same frequencies of events but that d 
would occur as components of entirely aiten 
processes. It is also possible that we. 
clinical process could be — =. 
periods of sampling but occur with cei] 
frequencies. The resulting change of fi ace 
of events would not be due to an actual Ee. 
in the original process but merely a "^ 
in the frequency of its occurrence. a e. 
cases a proper understanding of psychot ee 
requires knowing whether certain s ches 
processes actually change or whether dite] 
though still intact, merely occur pu! E 
Data from this second strategy wouie 
provide such information. 

The third desi 
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cern to practicing psychotherapists (e.g 
Chassan, 1961, 1907, 1970; Chassan & Bellak, 
1966; Sargent, 1961). 

The operational definition of process in 
terms of units, temporal relationships between 
units, and end points would appear to be 
capable of describing many aspects of the 
therapeutic encounter quite readily. The 
therapist concerns himself with cathexes, 
Motivations, and aspirations, or other aspects of 
patient behavior in which there is an end or 
goal toward which the patient desires to move. 
Indeed, the therapist himself is working toward 
Some overall goal or end in the therapeutic 
encounter. Further, patients and therapists 
differ among themselves with respect to the 
Specific behavior chosen to achieve their goals, 
or the order or sequence in which the selected 
actions are employed. Once therapeutic proces- 
Ses have been specilied as suggested above, it 
Would be possible to make comparisons among 
them in terms of their goals, units, and se- 
quential properties. 


Organization of Clinical Data into Operationally 
Defined Processes 

The actual psychotherapeutic encounter 
May be viewed clinically and experimentally 
as essentially a very long sequence of behaviors 
mitted by both patient and therapist. Within 
this stream of behavior? may be seen a dynamic 
Configuration of many interrelated processes. 
"or example, each sentence or phrase uttered 
Y the patient may be seen as a short process 
Naving a sequence of word units and a short- 
term goal of communicating something specific. 

S the patient speaks several such sentences 
d phrases, each may be viewed as a unit in 
& larger process used, for example, to give 
ormation or to express affect, or used as à 
rocess to defend against certain expressions. 
Other words, one may see the expression 
formation, affect, ar defense against it as 
Cess having each sentence OF phrase as ils 

emselves occur in sequences, 
€ of which may be given labels (e-8^ 
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€ he Many interrelated events in the lives 9 
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pulsion); or, restated, part of the long stream 
of patient behaviors, such as those labeled 
transference or repetition compulsion, may be 
seen as processes containing various expres- 
sions and defenses as units. Finally, the whole 
long stream of patient behavior, up to the point 
of patient change, may be seen as a process 
given a label denoting some form of neurosis or 
psychosis. This process may also be seen as 
having expressions of affect and defenses 
against affect as units. 

In similar fashion, therapist behavior may 
be broken down into a series of interrelated 
proces: One may also study patient- 
therapist interactions as ongoing processes, 
for example, transference-countertransference 


binds, equalization processes (Jaffe, 1964; 
Lennard & Bernstein, 1960; Matarazzo, 


Weins, Matarazzo, & Saslow, 1968), and games 
(Berne, 1964). 

As patient and therapist interact over a 
period of months some of the patient's be- 
havior processes, hopefully, will change. It is 
also conceivable that the therapist’s behavioral 
processes would change correspondingly. The 
sequential order in which both therapist and 
patient behavior changes is also a process 
having as its units each new form of the ori- 
ginal behavior as it changes. The process of 
change is perhaps what is most commonly 
referred to as the “therapeutic process” in 
process-oriented studies. It is also the most 
difficult to explicate. In fact, it is impossible 
to explicate until one has reliably delineated the 
initial or base-line sampling of patient and 
therapist processes in such a manner that any 
subsequent change can be detected. 

If the reader is convinced of the desirability 
of conceptualizing psychotherapy as a series 
of ongoing processes, the question arises as to 
the availability of statistical procedures that 
would enable one to examine process character- 
istics of psychotherapy. The following section 
presents a mathematical model for evaluating 
the process 5 previously described. 


STOCHASTIC PROCESS DEFINED 


Parzen (1962) defined a stochastic process 
as follows: 


he theory of stochastic process is generally defined as 
The theory oF S 


"amic" part of probability theory, in which one 
the ‘dy T collection of random variables (called a 
a process) from the point of view of their inter- 
stochas! ES 
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dependence and limiting behavior. One is observing a 
stochastic process whenever one examines a process 
developing in time in a manner controlled by proba- 
bilistic laws. Examples of stochastic processes are 
provided by the path of a particle in Brownian motion, 
the growth of a population such as a bacterial colony, 
the fluctuating number of particles emitted by a radio- 
active source, and the fluctuating output of gasoline 
in successive runs of an oil refining mechanism. Sto- 
chastic or random processes abound in nature. They 
occur in medicine, biology, physics, oceanography, 
economics, and psychology, to name only a few scien- 
tific disciplines [p. v]. 


The sequence of statements emitted by 
patient and therapist would seem to qualify 
as a “collection of random® variables" that 
may be studied “from the point of view of 
their interdependence and limiting behavior.” 
Certainly as one reads Rogers (1958, 1959, 
1961), Fenichel (1941), or Greenson (1967), 
one senses that therapy is a process which is 
developing in time. According to the theory 
and practice of psychotherapy, statements of 
patient and therapist do limit and control 
each other to a degree. It is hoped that the 
proposed model will help explicate the manner 
in which patient and therapist behavior de- 
velop in time, interrelate, and limit each other, 


Stocuastic Process ANALYSIS DESCRIBED 


It was suggested earlier that psychotherapy 
be considered as a long series of behaviors 
emitted by patient and therapist. Let us 
assume for the sake of this presentation that 
this stream of behaviors has been coded into 
discrete behavioral categories." The process 
= — o 

° “A random phenomenon is defined as 
phenomenon that obeys probabilistic, rather than de- 
lerministic laws [Parzen, 1962, p. 7]" Though 
psychological data may be conceived theoretically as 

determined," the data occur empirically 
Phenomena, that is, not completely 
extent to which psychological data 


Predictable is the extent to which 
sidered random, 


7 The problem of 


an empirical 


as random 
predictable. The 
are not empirically 
they may be con- 


$ selecting relevant units and cate- 
Re (eH. OW 1956; Berelson, 1952; 

uray, 1956; FK. Salzinger. Some problems of re- 
erbal behavior. Paper pre- 
rence on Methods of Measurement 
Quran Behavior, Montreal, September 
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analyses that are suggested by this article focus 
not upon the behavior categories themselves 
but upon the transitions between those 
categories. ' l 
The transition between two events (in this 
case behavior categories) is labeled a contin- 
gency. The first event of a contingency i 
labeled an antecedent. event; the second 
consequent event. The probability with p 
à consequent event is expected to occur gv e 
its antecedent is labeled a contingency magn! 
tude. The contingency magnitude indie 
the likelihood that a process will move from t ne 
antecedent event in question to a given anne 
quent. For example, in the sequence a —> 4 an 
a—b—b— c— a, Category a occurs as K 
antecedent three times. It is followed by tee 
twice and by b once. ‘Thus the probability E 
a following a, or the contingency magnitu 
of a — a, in the above sequence is .66 


2 transitions tod ———— 3 
3 occurrences of a as a consequent 
ency 


and the magnitude of the a — b comuni n 
is .33 (1/3). In like manner, the conting 5s 0 
magnitude b — b is .50, b — c is .50, posa 
c— a is 1.0, c— b is 0, and c > c is 9. how 
From the above paragraph can be pom. 
the proposed process analysis focuses nd 
the relative frequencies of the categories be. 
selves but rather the relative frequentie? n^ 


ʻo col 
n We at is. 
transitions between categories, Hut mes di 


i i ; many ti 

tingency magnitudes; not how many t happe” 
s; d 

a, b, or c occur, but what tended to 

after a, b, or c occurred. t 


a 
rred tO 


A process may, at least, be refe very 
two different levels of abstraction. ansitio? 
concrete level a record of serial tr? a 


~s Aerea 

av be considel 45 

between specific events may ont 
f some set 


process. For example, the record © Į righ 
stepping alternately with his left and P5 4 
(L-R—L-R-L-R-19 5 g 
crete representation of the “process roces’ Í 
ing.” At a more abstract level the pm rix E 
walking” may be represented by @ we mae 
probability statements (i.c., conting! ijow® 
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x iol nos: E jp is .bY 
nitudes) indicating that a right SUP ? ver k 


: Sema, evel Iu a 
by a left step 100% of the time S se by a 
a right step, and a left step 1S | neve p 
right step 100% of the time anc » 


dd 
leftstep(L— R = L00, R2» L =! 
= 0,and R  R = 0). 
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The chief difference between a concrete se- 
quence and its matrix representation is that the 
Sequence is a detailed series of specific tran- 
Sitions while the matrix summarizes the occur- 
rence of those transitions in terms of how they 
happened in general throughout the whole 
Sequence or in terms of contingent relation- 
Ships that might be expected on the average. 
Further, a sequence of events is but one of 
Many instances of the possible sequential 
relationships represented by a matrix. À matrix 
Contains information not only concerning what 
one might find on the average within a se- 
quence but what one might find within all 
Possible sequences having the same contingency 
Magnitudes. For example, the matrix for 
Walking would be as true for a specific sequence 
of Steps begun on the right foot as for one 
begun on the left. 

It should be noted that in representing a 
Sequence by a matrix, one ignores all the proper- 
lies of the sequence except relative contingency 
Tequencies. If a sequence can be described 
Completely by such a matrix, the sequence may 

* considered to be Markovian. 

Since the transition matrix is a complete 
description of a first-order Markov process, 
any statistic computed from one of its chains 
May be derived from the matrix itself. In 
other words, the first-order matrix may be 
“Sed to obtain additional summarizing process 
Characteristics. It would stand to reason, for 
example, that if one knew all the first-order 
Contingencies of a sequence (i.e., the probability 
phe following ain one transition or of a follow- 
mg pin one transition) he could make state- 
ments concerning the probability of b following 
P^ 2 Bi i „n transitions because each larger 
pne unit is composed of a series of first-order 
“ontingencies, Figure 1 displays all possible 


TABLE 1 
EXAMPLE CONTINGENCY MATRIX 


I is -—— 
Consequent 
Antecedent 
a | b | c 
hi 66 33 0 
à 0 50 50 
P 1.0 0 0 


TABLE 2 
x 


Matrix BASED ON 


GIVEN IN TABLE 1 
Category | a | b c 
a | H 38 AT 
b | .50 i25 25 
c | .66 i3 .00 


three-transition sequences that ‘could follow 
from the matrix in Table 1. 

The trees in Figure 1 give every possible 
sequence by which one could move from each 
of the states to every other state in from one to 
three transitions. The probability of each 
sequence may be obtained by multiplying the 
probabilities (contingency magnitudes) of each 
of the transitions involved [e.g., p (a—a— 
aa) = .66 X .66 X .66 = .29; p (a—b—c—a) 
= .16; p (b—>b—b—b) = .125; and p (coi 
a—a) = .4] One can also determine the prob- 
ability of moving from any state to any other 
state by obtaining the probability of every 
possible sequence between the two states in 
question, then adding the probabilities of those 
sequences. For instance, the probability of 
moving from a to a in three transitions is p (a> 
a—a—a) +  (a—0—c—a) = .29 + .16 = 45, 
and the probability of moving from c to b in 
three steps is p (c>a—a—b) + p (c—a—8b—5), 
which is .38. Tables 2 and 3 give the probability 
of moving from each event to all of the events 
in Table 1 in two and three transitions (i.e., 
the second- and third-order contingency 
matrices).* 

In analyses focusing on short sequences or 
early phases of longer processes, it is'necessary 
to take into account the state with which a 
process begins or is likely to begin. For ex- 
ample, the likelihood of c occurring after the 
first, second, or third transition of the process 
represented in Tables 1, 2, and 3 would vary 
considerably with whether a, b, or c occurred 
as the first event (as can be seen from Column 
c in each of the three matrices). The proba- 


5 The contingency magnitudes between all events may 
be obtained for any number of transitions simply by 
raising the power of the first-order matrix algebraically 
(Kemeny & Snell, 1960); for example, the two-tran- 
sition or second-order matrix in Table 2 is the matrix 
in Table 1 squared, the third-order matrix in Table 
3 is the first-order matrix in Table 1 cubed. 
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process has any cyclic properties these can be 
detected and described. For example, State c 
in Table 1 can return to itself in no less than 
three steps (note zero in c — c cell in Tables 1 
and 2 but not in Table 3). One may talk about 
the relative stability of certain aspects of the 
Process, For example, if the rows for States 
a, b, and c were .20, .20, .20, the process would 
be less stable than the one described by the 
Present matrix. Periods of cycles, the paths, 
the probability of getting into and out of cycles 
can also be determined. 

Finally, one can determine statistically 
Whether a process has changed by comparing 
the relative contingency frequencies during 
different periods of time throughout a sequence. 
Actually, changes in all of the process de- 
Scriptions listed above may be statistically 
assessed by some variant of the chi-square test. 
Anderson and Goodman (1957), Hoel (1954), 
and Bartlett (1951) have described some of the 
tests. The manner in which a process has 
Changed may be described in terms of changes 
n the process descriptions listed above. For 
example, one could show changes in contin- 
Sency magnitudes, probabilities of certain 
Sequences, mean distances between states, etc., 
‘cross successive units of time by means of a 
Staph (e.g. see Raush, 1965). 


STRUCTURAL ORGANIZATION OF 
Markov MATRICES 

Before the above analyses can be made, some 
Mormation concerning the nature of the 
Matrix must be noted. First, one must dis- 
“nguish its elements, the states. Basically there 
ia tes kinds of states: a transient state, an 
Sta : Ing state, and a null state. A transient 
Ate serves as an intermediary between other 
i It is always possible to move from some 
Sient to a transient state and from that = 
Cin a to some other state. States Ge b; a 3 
State able 3 are transient states. An absor em 
itself does not lead to any other state besides 
a While there are transitions from an 
B, omg State to itself, there are no eer 
s an absorbing state to states other than 
State Once a process has reached the ege 
Someth; Sonunuss there indefinitely or ps 
inside 'ng outside the process intervenes. rs 
Stoove of a phonograph record is an 


bo 
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TABLE 4 


MATRIX AMPLE OF ABSORBING, 
TRANSIENT, AND NULL STATES 
State FA | Tı | T | xN 
Absorbing (A) | 1.0 0 0 0 
Transient, (Tı) Ts T T 0 
Transient, (T+) T T T 0 
Null (N) T T T 0 


l denotes possible transition from row to column state. 


example of an absorbing state. The other 
grooves are transient, each leading to the next, 
but the inside groove leads only to itself. It is 
possible to have two or more states leading to 
each other but not to the rest of the matrix. 
This set of states is called an absorbing set. 
The function of the null state is the converse 
of the absorbing state. The null state has 
transitions to other states, but no states 
have transitions to it. The convention of 
saying "good morning" when one first enters 
a room and never repeating the greeting 
once it is stated is an example of a null state. 
Obviously, for a null state to occur at all it 
would have to occur at the beginning of a 
process. Table 4 illustrates the above three 
kinds of states. The 1.0 in the A— A contin- 
gency indicates that the absorbing state leads 
only to itself; the zeroes in the absorbing row 
and null column indicate that no transition 
is logically possible. The magnitudes of the con- 
tingencies labeled T may range from 0 to 1 
(with the exception that T; — T; and T» > T» 
may not be 1.0). 


"THREE Types or MARKOV CHAINS 

Different combinations of the above types 
of states will result in different kinds of se- 
quences or chains. Actually, the chains may be 
classified in terms of the kinds of functions 
employed. Three types of Markov chains are 
discussed briefly; an absorbing chain and two 
types of nonabsorbing chains, cyclic and non- 
cyclic, or regular. 

“An absorbing chain contains at least one 
absorbing state. The salient property of this 
chain is that as the process continues the proba- 
bility of the x + 1 state being the absorbing 
state increases to 1. In other words, no matter 
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where the process starts it will eventually 
wind up in an absorbing state.? 

A nonabsorbing Markov chain, technically 
referred to as an ergodic Markov chain, is 
composed of all transient states. The chief 
identifying property of an ergodic Markov 
chain is that its process may begin in any 
state and after a sufficient period of time reach 
any other state, 

An ergodic chain may have cyclic properties 
such that it can be in certain states only at 
given times and never at other times. A simple 
example is the matrix representation of walking 
previously discussed. The process can only be 
in each state every other time no matter how 
long the process is iterated. Thus, while it is 
possible eventually to move from one state 
to any other in a cyclic ergodic chain, there 
are some time periods in which transitions 
between given states are not possible no matter 
how long the process continues, 

A regular Markov chain is ergodic but not 
periodic, that is, after a sufficient number of 
transitions each state is accessible to any of 
the other states for all subsequent time periods, 
(Mathematically, as the matri: is raised in 
powers, there is a point after which no cell 
entry is zero.) Table 1 is a regular Markov 
chain. The fact that there are 
third-order matrix (Table 3) demonstrates 
that each state is accessible to every other 
State after at least three transitions. This 
Property of accessibility of all states after a 
given number of trans 
trast to the absorbing 
absorbing states are 
number of transitions, 

It is possible to construct M 
?VIng some combination of the functions 
described in the above three general types of 
Markoy Chains. Often these more complex 


Chains are needed when modeling natural 
phenomena, 
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through simple one-step contingency Ww 
tudes to increasingly. higher-order description 
of processes using stochastic process annie 
"These analyses provide measu rable descrip. a 5 
of an ongoing process by locating pace 
temporal relationship to each other Me 
contingency magnitudes 1, . . D Hr. E. 
apart, mean and standard deviation v E. 
distance between events, most pro 
quences found in a process, and the -— 
order process description connoted by tyP 


ft. s enable one to 
Markov chain. These analyses enable i ol 
abstract from the whole ongoing strea that 


events certain patterns or d seite ee 
characterize it. Furthermore, these Went 
iptions are in a form that permits "X as 0C 
assessment. of whether or not change "e nge. 
curred and indicates the nature of that cha 
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ANALOGIES BETWEEN STOCHASTIC 
ANALYSIS AND CLINICAL 
Process ANALYSIS uen- 
The above approaches to describing Eeo 
tial data have many similarities ym 
observation and description. The pe 
essentially observes a word by word cut 
by thought—nonverbal cue by iui it E 
stream of events emitted by the aee units: 
he responds in terms of similar behav i 
As the therapist and patient m emer 
patternings of words and ideas begin Ric 
Again and again Freud, Lose rea 
Greenson, and others admonished ` a subtle 
to watch for sequential patterns Te roces? 
changes in patterns that occur in a patie”! 
of therapy. Many confrontations w ol een 
take the form "whenever x happen sengiti ^ 
to y." The strategies of noting cons gq 
been y 
As these Pi 


therapists for a long time. parks © 
are diagnosed the therapist oor p. 
pattern of interaction with the i con^. 
could possibly be viewed as 4 5 
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loosen à A 


figuration (e.g. ay 
self-esteem, desensitize or eX 
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' d 
interaction by noting changes in the behavioral 
pattern of interest. In summary, clinical ob- 

| servation and description would appear to have 
many similarities to stochastic process analy- 


Ses; further, stochastic process analyses pro- 
vide the capability of making the complex 


g 
behavioral patterns which emerge in psycho- 
therapy operationally definable (as sets of 
Contingencies, most probable sequences, etc.), 
of statistically assessing whether or not change 
in those patterns does occur, and of demon- 
Strating the manner in which the change occurs. 


This article has suggested an approach to 


Studying temporal relationships among events 
Occurring sequentially. It should be emphasized 
that any number of different kinds of stochastic 
Process analyses may be used depending upon 
the nature of the data and the question being 
explored by the study. It is hoped that this 
very abbreviated presentation will stimulate 
the reader to consider the Markov chain as a 
Possible tool for studying psychotherapeutic 
Processes. A more elaborate mathematical 
treatment of Markov chain statistics. by 
Kemeny and Snell (1960, 1963), Kemeny, 
Snell, and Thompson (1956), Bharucha-Reid 
(1960), Parzen (1962), Hoel (1954), Bartlett 
(1951), and Anderson and Goodman (1957) 
Should provide adequate preparation for carry- 
Ing out such analyses, A study by Jaffe (1968) 
which presents Markov analyses of dialogues 
between college students in a laboratory setting 
and the Raush (1965) study (see Footnote 3) 
re also suggested as helpful examples as to 
how to collect data for analysis bv Markovian 
Statistics, 
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SOME OBSERVATIONS ON TWO METHODS IN PSYCHOLOGY 


H. E. BROGDEN ! 


Purdue University 


Distinctions between the experimental and the observational (or correlational) 
approaches are made explicit. It is suggested that the observational approach 
involves a distinct set of problems and reasons are given for believing that, 
for these problems, controlled experimentation is inadequate. Basic statistical 
models, including some that permit error of measurement in the independent 
variables, are briefly considered. It is suggested that fixed models are not 
generally appropriate for the observational method and that random models 
do not normally yield useful information in a controlled experiment. In par- 
ticular, it is contended that the magnitude of variances of a random model 
and of the error variance of both random and fixed models is generally arbi- 
trary in experimental studies. This conclusion suggests that a variety of co- 
efficients associated with analysis of variance and regression models are inap- 
propriate in an experimental setting and that accuracy of prediction, in the 
usual sense, is not a proper objective of experimentation. Finally, this article 
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E 
| In psychology today the controlled experi- 
ment is generally regarded as a standard ap- 
proach that is basic to scientific development. 
Yet, in such areas of psychology as individual 
differences and tests and measures, other 
techniques, which might be described as ob- 
servational, are widely used. Aspects of the 
distinction between these two contrasting ap- 
proaches have been discussed by Cronbach 
(1957) and others. It is the thesis of this 


mainta 


ins that constructs developed 


through an observational approach are 


not likely to be useful in an experimental science. 


article that there are distinct objectives to 
Which each general approach is applicable and 
that some confusion still exists regarding 
(a) the nature of the distinction in the basic 
research methodology appropriate to each, 
(5) the appropriateness of various statistical 
models and coefficients to the two approaches, 
and (c) the implications of these distinctions 
for the development of constructs. 

The experimental approach has a standard 
Meaning that requires no elaboration. We 
distinguish the observational from the experi- 
mental approach in that the independent vari- 
ables are observed but not manipulated. In a 
Similar vein Cronbach (1957) stated, “While 
the experimenter is interested only in the 
Variation that he himself creates, the cor- 
Telator finds his interest in the already exist- 
mg variation between individuals [p. 671].” 
a eae 

1 Requests for reprints should be sent to H. E. 


Brogden, Department of Psychology, Purdue Uni- 
*rsity, Lafayette, Indiana 47907. 


Cronbach’s concept of the correlational 
method corresponds to our observational 
approach. Content or problem areas in psy- 
chology characterized by the use of the ob- 
servational method include selection and clas- 
sification, the problem of prognosis in clinical 
psychology, factor analysis of ability and 
personality measures, individual differences, 
and research on construct validity. 

The objectives of the experimental ap- 
proach need no comment. While many ob- 
servational studies have directly applied ob- 
jectives that are reasonably apparent, the 
purpose of research on construct validity, 
factor analysis of psychological measures, or 
individual differences in general is not clear, 
Such research has, for the most part, under- 
standable objectives if it seeks to provide 
generalizations and theory in support of 
directly applied research in selection, classifi- 
cation, prognosis, and the like, and many 
studies are interpreted in this way. The objec- 
tives are again understandable, though an 
available methodology may well be questioned 
if the investigator explicitly seeks to accom- 
plish experimental objectives through an ob- 
servational approach. A study not explicitly 
pointed toward either objective often seems 
to be undertaken with the idea that contribu- 
tions of both approaches will ultimately merge 
into a common body of knowledge. This seems 
to be the position of Cronbach (1957), Vari- 
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ous of our later comments are in conflict with 
this position. 


DISTINCTIONS IN THE Basic Locic 
OF THE Two APPROACHES 


We ignore, for the moment, statistical 
models and center on the distinctiveness of 
the generic structure of research in the two 
approaches. Our argument, in essence, is that 
the methods and findings of controlled experi- 
mentation are not appropriate to observa- 
tional problems, although we accept the use 
of controls to enhance accuracy of measure- 
ment. While the reverse position might also 
be considered, the inadequacies of the obser- 
vational approach applied to an experimental 
problem are well recognized and need not be 
elaborated here, 

Consider the following situation. Say the 
knowledge were available such that, through 
control of genetic structure and laboratorylike 
control of a life history, a defined perform- 
ance by man in a given setting could be pre- 
determined at will. Such a scientific develop- 
ment would be of little, if any, clear value in 
selecting from a group of applicants, unless 
each of the applicants had been subjected to 
the controls mentioned above and a record 
of the manipulations were available. The xs 
would be unknown in practice and f(x) would 
then be of little value in estimating y. While 
Such knowledge may Suggest tests of value 
in selection, even this benefit is dubious in 
practice unless it is accompanied by knowl- 
edge of the way, in the relevant culture, the 
environment has been Structured; that is, 
knowledge of the treatments of the culture 
accords its members, Even if a record of the 
ble and known to be 
€ of some aspect of the 
ty may be low since the 
all applicants are reared 
be such that individuals receive very 
nearly the Same treatments, Similar reasons 
Ne inferences, from experimental evi- 
n. pet of uu the relationship between 
variables. in eis e cy among any set of 

Tests e observationa] study. 

Tom the usu re’ Senerally differ in kind 
Usual independent variables of a 

; in this context, 

endent variables, 
> is simply not a 
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manipulable variable, in the sense that ane 
of fertilizer and the like are manipula! id 
and this cannot function as an indepe F 
dent variable in the standard conti 
periment. Transfer of information d 
the two approaches is limited as a resu E E. 

Furthermore, experimental controls are T 
appropriate in an observational Ww id 
selection research, experimental "d e 
generally avoided, and one seeks a Sele that 
measure, possibly a regression foncion, pa 
will permit maximization of the ES. de 
scores of those selected. If we establish ap- 
correlation between the measure and -- 
propriate criterion in a sample of app p. i 
then experimental controls, which are id 
part of this well-known procedure, ‘i are 
relevant. The objectives of such a md ^ 
fulfilled to the degree that we suppor 
validity of our selection measure T 3 
population and thus in future a d 
applicants. Since the typical variab a 
selection study cannot be manipulate” eu 
use of experimental controls in pea or 
implies determination of predictor vali Nest 
subsamples with fixed or constant val pviou 
the control variables, and it should be d ul- 
that this evidence would support the sr 
ness of the predictor only for such nigh 
While it would be possible, from the we 
of such subsamples analysis, to ene valid" 
regression function with demons a yd 
ity in the entire sample, this approac a 
that the control variables are predicto or be 

We suggest now additional pa in 
lieving that controls are not approP ogic of 
any observational study. When pode y ap 
controlled experimentation is seno pena 
plied to the tests or measures of A rely 
tional study, one realizes that it astant 7. 
possible to hold such variables p r e 
manipulation, that the unending deis d m 
potential tests and QNM T " 
possible to hold “all other variabl 
by selecting subgroups having 4^ 
the controls, and, finally, that "dues " 
identify subgroups with fixed V variant 
control variables, the true siia ifi^ 
any test or measure under invesus f 
approach zero for such subgroups. ent Í 
cation of the last statement is sid ” 
considers that “all other va nens " 
include other more reliable 


TWO METHODS IN PSYCHOLOGY 


the traits or characteristics involved in the 
independent variables of the investigation. 


Basrc STATISTICAL MODELS AND THEIR 
IMPLICATIONS FOR EXPERIMENTAL 
AND OBSERVATIONAL APPROACHES 


Some basic statistical models are now 
considered together with features of these 
several models that illuminate differences in 
logic between our two general approaches 
and suggest basic distinctions in the models 
properly associated with each. 

Consider the following statement of the 
general linear model: 


yji = Bo + Bj + eii 1] 


Where 8j, a parameter, is the effect associated 
with the jth fixed set of conditions, Bo is the 
general mean, ej; is the error attached to the 
ith observation under the jth set of conditions, 
and yj; is an observation on the independent 
variable. The cj; are, by the model, indepen- 
dently and identically distributed with expec- 
tations of zero. We interpret £; as the effect 
of a change in conditions, since the presence 
of Bo implies the restriction that the average 
Bj; (across j) is zero. 

In a two-way analysis of variance, 8; can be 
interpreted as a cell mean and a typical fixed- 
effect analysis of variance model is a special 
Case of the linear model. Now if we assume 
that all interactions are zero, and that the 
main effects of each analysis of variance factor 
Ty be written as a multiple of a single re- 
gression coefficient, we have a strictly linear 
Tegression model, and the regression coeffi- 
cients are each interpretable as the experi- 
Mental effect of any unit increase in the given 
Independent variable. Thus a strictly linear 
regression model is a restricted case of an 
analysis of variance model and hence a re- 
Stricted case of the general linear model. 
Implicit in the conversion to a regression 
Model is the assumption that the levels of the 
analysis of variance factor are quantifiable. 
“Or the nominal variable, persons, for exam- 

€, the regression model would not be ap- 
Propriate, 
an an experiment the effects of € 
e tables are controlled or randomized anc 
NM Bj represents the experimental PM 
late of experimental conditions. The postu 

Characteristics of the errors depend upon 
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proper design of the experiment, and it then 
follows that the expected value of vj is 8; + 
Bo and that B; is interpretable as an experi- 
mental effect. 

Alternatively, when applying the model to 
observational data involving no specific ran- 
domization procedures, we can define 8; as the 
expected value of v; and define error as the 
residual or y; minus £; and thus insure that 
the expectation of the errors is zero. It may 
then be reasonable to suppose that the errors 
are independently and identically distributed. 
It is clear that the interpretation of 8; must 
be clouded. If randomization is lacking, the 
independence of the errors will often be par- 
ticularly open to question and, as a result, 
still additional bias in 8; may be introduced in 
some instances. It is the second interpretation 
of B; that has been adopted in test theory 
(Lord & Novick, 1968, p. 29), and it is prob- 
able that this is the only permissible interpre- 
tation in the great majority of observational 
studies. In classical test theory, as one exam- 
ple, 8; may be interpreted as true score on a 
test, the set of experimental conditions as a 
person, and the ej; may depend upon the par- 
ticular occasion (including measurement con- 
ditions) on which the test is administered to 
a person. 

The fixed regression model in the observa- 
tional approach usually implies that each 9; 
of Equation 1 is defined as the mean of ji; 
across a population of persons and a popula- 
tion of occasions, for each jth fixed vector or 
pattern of test scores, and error is then de- 
fined as a residual that is random across per- 
sons and occasions. Thus, each member of the 
jth population is characterized by the jth fixed 
pattern of test scores. Since the test scores are 
quantifiable, a strictly linear model may be 
found to fit the 5;. 

Now randomization of the errors and ex- 
perimental controls may characterize an ob. 
servational study. In testing, an occasion 
could be randomly and independently drawn 
for each administration of each test to each 
person. In fixed regression, a random and in- 
dependent draw of both persons and occa- 
sions is needed. Appropriate controls should 
be used in both cases. We can, then, manipu- 
late the nominal variable, persons, and esti- 
mate the experimental effect of the placement 
of a person in a job as measured by a cri- 


434 


terion variable, and we can establish the mean 
person effect associated with a fixed vector of 
test scores in the same sense, but the signifi- 
cance of such treatment effects is limited, 
since, in either case, the independent variable 
is purely nominal. Although the elements of 
the fixed vector of test scores might appear 
to specify a set of experimental conditions in 
the fixed regression model, it should be ap- 
parent that the populations associated with 
the fixed vectors of test scores will differ in 
many ways over and above those specified by 
the test score vectors. The manipulations and 
controls, as described above, pertain to per- 
sons, not test scores, and the experimental 
effects are thus associated with persons as 
experimental conditions. If a strictly linear 
regression model is fitted to the mean person 
effects associated with the fixed test score 
vectors, it should be clear that the regression 
coefficients of this model cannot be inter- 
preted as the experimental effect of a unit in- 
crease in the independent variable. Such 
estimates of experimental effects in an obser- 
vational study are, in another sense, merely 
the outcomes of the process that establishes 
a test score, or a criterion score, and proper 
randomization and control are means of elimi- 
nating bias in this psychological measure- 
ment procedure, 

Although fixed models generally imply no 
error in the xs, there is a little-known strictly 
linear regression model (Kendall & Stuart, 
1967, p. 408) that permits such error and 
that seems appropriate to the manipulated 
independent variable of an experiment. With 
error in x; one might modify the strictly linear 
Tegression model and write: 


Vit = B(x + 8) + B, + 6ji 
= Bx" + Bot (88; + ej) [2] 


Where 2 is the slope 
Constant, [n p] 
fallible eg 
intention, 


and f the regression 
ace of xi we have *j*, the 
timate of æ; which represents our 
1 plus 8), or the error in - If the 
manipulati ations the ’ 
anipulative operations that fix x;*, and seck 
0 fix Xj, are independently undert 
each 


observation, both Bd and 
should be independently and 
tributed. wW, 


aken for 
Bòn + cj 


identically dis- 
e emphasize that x* 
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both error of measurement and a compan a 
of the true measure, x;;. If the expected va 2 
of 8; is also zero, and the expected bees 
Vii is then Bx;* plus Bo, it can be seen thai T 
model of Equation 2 is simply a special case if 
strictly linear fixed x regression. Mo 
we interpret B as the experimental effec * 
any unit increase in x;*, this gitim d 
the same meaning as the experimental "d 
associated with a unit increase in Nj. A 
readily seen that the model can be generaliz 
to more than one independent variable. ; for 
Since the expected value of 3j; is d by 
each of the several fixed values x;* implied is 
the design, the error of measurement, Si E 
unrelated to x;*. Obviously, for fixed pe 
and 8); are perfectly correlated. In test ee 
terminology, error is correlated with wr 
but uncorrelated with observed score. " cets 
the direct contradictions between these K is 
of the model and standard test Rc e 
very apparent that this model is net ne i 
priate in test theory nor, for that matt error 
any observational study. In such studies 
is independent of true score. a suited 
Fixed regression models seem to be z A sam- 
to the observational approach. Both n per- 
pling of occasions and the sampling 2 cot 
sons, as required by the model, do rational 
respond to current practice in po pre 
studies. Error is normally present in t q the 
dictors of an observational study, pe. in 
model described above, which pirma pu 
the xs, is appropriate to experimenta ^. 5 
only. Fixed analysis of variance dese er ve 
appear to have limited usefulness en 
tional studies, although detailed ps ied, are 
this point, beyond those already iml 
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preferable for observational studies, and since 
this conclusion seems to be generally accepted, 
discussion of this particular issue is not ex- 
tended. 

We presume that the value of fixed models 
for controlled experiments is clear and gen- 
erally accepted, and we turn to a considera- 
tion of random models for this purpose. We 
stress initially the arbitrary character of the 
variability in an independent variable. In a 
strictly experimental study, it is the experi- 
menter who chooses the particular values and 
the range of each treatment factor in an analy- 
sis of variance or regression analysis. Since the 
degree of variation of the independent varia- 
bles reflects arbitrary choices of the experi- 
menter, it follows that the degree of variation 
in the resulting treatment effects is arbitrary. 
Because the distribution of the treatments is 
created by the experimenter, it is also artificial 
to postulate the existence of a population of 
treatment effects and certainly artificial to 
hold that the probability distribution of the 
treatment effects is Gaussian. Thus, the de- 
sign as well as the state of nature determines 
the findings that distinguish a random from 
a fixed model. We conclude that random de- 
signs have doubtful relevance to studies in- 
volving manipulated independent variables 
and are thus doubtfully relevant to most ex- 
perimental studies, Similar considerations ap- 
ply to estimates of the error variance whether 
the design is fixed or random. The magnitude 
Of c, reflects choices of the experimenter be- 
tween control or randomization of the effect 
of extraneous variables and is likewise arbi- 
trary. In research on systems, random designs 
may sometimes be relevant, but such research 
is rare in psychology today. We recognize 
that variances are relevant to an evaluation 
of the power of an experiment, but this con- 
Sideration does not conflict with our position. 

Now it is true that a person factor of an 
experimental analysis of variance must be re- 
garded as random since interest usually does 
ot center in the (fixed) persons of a par- 
ticular study. Generally, however, the main 
effects of this factor of an analysis of variance 
E" little direct interest to an experimenter, 

» if no Interactions are expected, a design 
e: Sid individual differences as error 
Otis d 5e reasonable. If genetically homogene- 

Subjects reared under identical conditions 
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were available, such a design would be fully 
appropriate and, probably, generally pre- 
ferred. 

lt is suggested next that a variety of co- 
efficients, including particularly the reliability 
coefficient, the correlation coefficient, and the 
proportion of variance attributed to an analy- 
sis of variance component, have little meaning 
or relevance in an experimental study. We 
presume that variances of the independent 
variables, correlation coefficients, reliability 
coefficients, and the like are defined in fixed 
analysis of variance or regression designs so 
that comments herein are general to such co- 
efficients whether they derive from fixed or 
random designs. For the purpose of this dis- 
cussion, the variance of y in fixed designs is 
assumed to include variation across experi- 
mental conditions. 

The belief that such coefficients are basi- 
cally irrelevant to an experimental study 
stems from prior observation that the vari- 
ance of the experimental effects and of error 
are both in large part determined by the ex- 
perimenter. These, in turn, largely determine 
the magnitude of a correlation coefficient, a 
multiple correlation coefficient, or the percent 
of variance in y attributable to one or more 
analvsis of variance components. Since the 
component variances are arbitrary, we must 
conclude that all such coefficients are equally 
arbitrary. 

Reliability coefficients, whether they per- 
tain to independent or dependent variables, 
must depend upon true score and error vari- 
ance, and these, in turn, are usually arbitrary 
in the sense mentioned above. The true score 
variance of a manipulated independent varia- 
ble is always arbitrary, and all standard anal- 
ysis of variance and regression models imply 
no error in such variables. If the true score 
variance of v is interpreted as the variance of 
the experimental effects, it, too, is clearly arbi- 
trary. Though not arbitrary, true score vari- 
ance of v that reflects individual differences, 
whether estimated separately or from the ex- 
perimental data, has no clear relevance to an 
experimental investigation. High reliability 
might then stem from large individual differ. 
ences—and large individual differences are 
normally considered undesirable in an experi- 
ment. Reliability has natural meaning in ob- 
servational studies. If individual differences 
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are large, more error can be tolerated than 
when such differences are small. It is integral 
to test theory. In experimentation, it seems 
clearly less relevant than a measure of error 
variance alone. 

If one agrees, as we have argued, that ce 
and the degree of variation of an independent 
variable in both fixed and random designs are 
clearly arbitrary, one must further agree that 
measures of accuracy of prediction that de- 
rive therefrom cannot be a legitimate concern 
of experimentation, We have already com- 
mented on the arbitrary character of measures 
of accuracy of prediction such as the correla- 
tion coefficient. As a measure of accuracy of 
prediction, c, is arbitrary also. We note that 
v, may, but need not, relate to the accuracy 
of the estimation of experimental effects. If 
one chooses, because of convenience or cost, to 
increase the number of observations rather 
than reduce the magnitude of the error term 
by more precise controls, there need be, with 
proper planning, no loss in the accuracy of 
the estimate of the experimental effect as a 
result of this choice. The trade-off between 
these two factors is well known. 

When the implications of findings for prac- 
tice are considered, the magnitude of the Te 
associated with the research study is, in itself, 
irrelevant and the variability of results evi- 
dent during applications is also unimportant. 
If a research study establishes for a specified 
range of conditions that the experimental ef- 
fect of a ton of fertilizer is always two tons 
of corn, the magnitude of se is clearly not 
pertinent so long as the standard error of the 
experimental effect is negligible. Furthermore, 
the expectation of major fluctuations in the 
B o diferent plots or different years 

l er one from the use of fertilizer. 
agaries of rainfall, soil condition, and the 
like may produce high variability in the ac- 


tual yield, but the yield į 

yield is always two tons 
Breater than it otherwise 
long as the gi 
application is 
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formulation only as they related to the power 
of the experiments or the accuracy of the (e 
timates of the parameters. If one utilizes y. 
established law in further development [U 
theory, then one is mainly concerned with the 
statement of the law itself. 

The possible meanings of accuracy of pre- 
diction as we have used it should be reason- 
ably clear. There are possible alternative 
meanings not subject to the criticisms voice 
above. 


Res "TONAL 
EXPERIMENTAL VERSUS OBSERVATIONA 
CONSTRUCTS 


i he 
In approaching the topic of constructs, th 


distinctiveness of the two approaches is ue 
our major thesis. To support this thesis E 
offer a conception of a simple expertus E 
construct and give reasons for believing js 
constructs which characterize the ou 

tional approach do not fulfill the requirem 

of such experimental constructs. ith dis- 

Suppose 3, yj, . . . are variables wit lieve 
tinct operational definitions which we vote 
to be conceptually equivalent. If, th ysl 
controlled experimentation using an Kms 
of variance or regression model, one es 
the experimental effects of a number af using 
for a number of independent variables saria 
separate observations on the dependent ; that 
bles y; yy, . . . , and if one then ee a 
these estimates of experimental wee d 
consistent with the hypothesis that 2 linea 
perimental effects are within the dd se 
transformation of each other for all T a. 
of experimental conditions, then V» ^ iva nt 
may be regarded as conceptually ab varit 
for at least the given set of independen singe 
bles. Evidence such as this supports e sê 
experimental construct to replace er 
dependent variables. We emphasize timate 
a construct basically derives from € 
of experimental effects. 

Tt might appear that closely ervat? us 
sults can be obtained from an obs vari pe 
study if measures of the dependent ^. e. 
Yi Yr . . . are obtained for each in ti p 
the past history of the person ÍS ats or © 
resent a set of experimental aee t 
ditions. Then the correlation be tre? 
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whether the two variables are, except for error 
of measurement, within a linear transforma- 
tion of each other—approximately at least. It 
should be apparent, however, that the treat- 
ments accorded each person are essentially 
unknown and the “design” is haphazard and 
inadequate. The genetic structure of the per- 
sons will bias the obtained correlations for 
our present purpose, and confounding of sets 
of independent variables could lead to even 
more serious problems. Correlated indepen- 
dent variables, for example, could lead to 
correlation between the effects of two de- 
pendent variables that would be uncorrelated 
in an orthogonal design. To the degree that 
studies of construct validity depend upon 
correlations such as those described, it would 
appear that they do not support the adequacy 
of a construct for experimental purposes. 
Many other types of observational studies, in- 
cluding many factor analyses, are based on 
correlations across persons, and, however use- 
ful they may be in support of other objectives, 
they too are inadequate for experimental pur- 
Poses. 

Perhaps it will be helpful to briefly de- 
scribe, as a concrete illustration, a study of 
dependent variable constructs (Whimbey & 
Denenberg, 1966), Using rats of homogeneous 
genetic structure as subjects, life histories 
were controlled by manipulating a variety of 
independent variables believed to be signifi- 
cant to emotional and sexual adjustment. 
Measures were then obtained on a substantial 
set of dependent variables. Given genetic 
homogeneity of the subjects and adequate 
randomization, it was reasonable to suppose 
that the interactions between subjects and the 
experimental factors would be absent. The 
mean of each dependent variable across sub- 
Jects for each analysis of variance cell could 
thus be taken as an estimate of an experi- 
mental effect. Then the correlation between 
pairs of such estimates for any two dependent 
variables across all cells was used as an indi- 
Cant of their interchangeability—that is, as a 
means of deciding whether the experimental 
effects of this pair of dependent variables 
agreed within a linear transformation. Al- 
though Whimbey and Denenberg (1966) 
Went on to factor the intercorrelations of the 
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set of dependent variables, the added com- 
plexities thus introduced are not discussed 
here nor are problems of statistical inference 
that arise in such a study. It is merely sug- 
gested that the unusual design of this study 
permits empirical support for the hypothesis 
that a set of dependent variables are inter- 
changeable. By example, however, it is not 
difficult to show how data from an uncon- 
trolled observational approach could lead to 
faulty conclusions. Suppose that the inde- 
pendent variable, amount of handling, is re- 
lated to the dependent variable, defecation 
rate, but not to activity level and that feed- 
ing condition (together or separate) is related 
to activity level, but not to defecation rate. 
If, when rats are fed together, they are han- 
dled and are not handled when fed sepa- 
rately, it is readily seen that two unrelated 
dependent variables might appear to be inter- 
changeable. Two interchangeable dependent 
variables could fail to be identified as such 
simply because all animals received the same 
treatments on the relevant independent varia- 
bles, and no reliable variation was generated. 
The idea of a construct in this study has no 
necessary relationship to individual differ- 
ences. If treatment effects and repetitions 
could be legitimately obtained on the same 
subject, a study in support of such a con- 
struct could in theory be completed on a 
single subject, since estimates of experimental 
effects for several dependent variables under 
a variety of experimental conditions are the 
basic needs. Such a study might then be re- 
peated with other subjects, and the hypothesis 
concerning the construct could receive further 
support whether individual differences were 
substantial or all subjects were homogeneous. 
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Current associative measures of word relatedness are criticized for being in- 
sufficiently versatile and for having inadequate statistical justification. This 
criticism is extended to a widely used measure, the mutual frequency score, 
which has been held to be based on the product-moment correlation. Measures 
of the reliability of free association are criticized on a similar basis. The 
reliability of free associations has typically been measured as the tendency o7 mil 
subjects to respond consistently on retesting. It is argued that the stability of 
the distribution of associations to a given stimulus, regardless of the behavior 
of individual subjects, is of greater interest and application. It is maintained 
that an association measure legitimately based on the product-moment cor- 
relation, and appropriate for measuring both word relatedness and the reliabil- 
ity of associations, is much needed. Such a measure is described and is claimed 
to be a very general measure, applicable to the measurement of any form of 
associative relationship. The experimental performance of the measure, in the 
Prediction of free recall and the measurement of reliability, is summarized. 


f The widespread use of free-association data — the prediction of recall. There are many ya 
in experimental, nonclinical applications, is of measuring recall, and association data how 
mainly a phenomenon of the last 20 years. been used to index most of them, including 
The applications have been numerous, espe- the number of words recalled from a list, n 
cially those involving the use of comparisons number recalled from one or more e 
of associative response distributions to two the number of conceptual categories Ps 
or more stimuli, based on group data. Simi- sented in recall, the tendency of words ive 
larities between associative distributions have cluster in recall, etc. All these predict" 
been used to index, for instance, the number tasks, and others, have required the asso 
of perceived categories in a word list (Willner tion data to be treated in different WaYS a 
& Reitz, 1965), social class differences in though in every case it has been the sim! 
France and the United States (Rosenzweig, ity of associative distributions that M en 
1964; Rosenzweig & Miller, 1966), the per- interest. Thus, particular measures have een 
ceived semantic similarity of pairs of adjec- developed to describe the relation en ; 
tives (Cofer, 1957), the difference in associa- pairs of words (Bousfield & Puff, 1965; a ed 
ee of single and multiple stim- Jenkins & Russell, 1952), the oa 
rey de he e mide es fo i 
tive hierarchies a ee on » words ( Deese, 19592, rene 5 2 frais e ed 
iue toon zo d ý » 1964, 1965; 1963), the cohesiveness of sublis owas 
: m He = ee : in longer lists (Bousfield, Steward. à e ot he? 
fon dis eus o app'ication of associa- 1964), the relation of one word to 1b): 
T, has undoubtedly been to on a list (Rothkopf & Coke, 19614 15 the 
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FREE ASSOCIATION 


studies have been performed, and in any event 
it is difficult to compare results of studies 
using different measures. 

Two aspects of the confused situation are 
noteworthy. The first is that only a very few 
of the many association measures are closely 
related to theories of cognition or associative 
functioning (e.g, Bousfield, 1953; Deese, 
1965); the rest are simply empirical generali- 
zations, often developed ad hoc. The second is 
that none of the currently popular association 
measures are explicitly derived from, or con- 
sistent with, statistical and measurement 
theory. Association distributions are certainly 
amenable to rigorous statistical analysis; 
nevertheless, the popular word association 
measures of bivariate correlation  (Bous- 
field, Whitmarsh, & Berkowitz, 1960; P. M. 
Jenkins & Cofer, 1957) and of variance 
(Brotsky & Linton, 1967; Horvath, 1963), 
as well as of test-retest reliability (Gegoski 
& Riegel, 1967; Hall, 1966), have been 
formulated without reference to the available 
and appropriate statistical models. Inevitably, 
the reliability, validity, and generality of the 
measurements have suffered as a result. In 
particular, elaborate statistical analyses per- 
formed upon the results obtained with such 
measures (e.g, Deese, 1962, 1964, 1965; 
Howe, 1966) may be questioned if based upon 
incorrect assumptions about the derived data. 
For purposes of factor analysis, for example, 
it is essential to have measures that behave in 
accordance with the requirements of the sta- 
tistical model used in analysis. 

Problems in measuring the strength and 
pattern of free associations seem to be cen- 
tered around the presence of too many mea- 
sures and too little coordination of effort. 
Problems in measuring the reliability of asso- 
Ciations have almost exactly the opposite 
focus; one measure has been dominant, and 
effort has been coordinated all too well. Rela- 
tively little experimental work has been done 
9n the reliability of associations. The few 
Studies that have been performed have all 
Studied test-retest reliability, and have all 
“sed a measure which is both methodologi- 
Cally and statistically questionable. 
stu he measure that has been used in previous 

dies is the proportion of identical associ- 
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ations. It is defined as the proportion of sub- 
jects who give the same response to a given 
stimulus on two consecutive administrations 
of a word association test. Hall (1966) studied 
the 1- and 3-week test-retest reliability of 54 
words chosen randomly from the Thorndike 
and Lorge (1944) word count. The mean pro- 
portion of identical associations for his 61 
subjects was .50 over the 54 words, and was 
unrelated to retest interval or word fre- 
quency. Brotsky and Linton (1967) studied 
the 10-week test-retest reliability of the 100 
Kent-Rosanoff Free Association Test words 
and of 58 words from the Connecticut norms ?: 
they found a mean proportion of identical 
associations of only .32. They attributed the 
lower proportion they found, compared with 
that of Hall (1966), to the greater period of 
time between their tests and to the higher 
average word frequency in their lists. They 
found a correlation of —.34 between word 
frequency and proportion of identical associ- 
ations. Gegoski and Riegel (1967) reported a 
mean proportion of identical associations of 
.28 on a similar task. 

The proportion of identical associations to 
a stimulus word measures the mean within- 
subject stability over trials of responses to 
that word. Often, however, the question of 
interest may be the stability over trials of an 
entire distribution of responses, without re- 
gard to the response consistency of individual 
subjects. 

The reliability of an entire distribution of 
associations is particularly relevant to mea- 
sures of word association computed from re- 
sponse data pooled across subjects, as almost 
all association measures are. If the response 
distributions to two stimuli are found to be 
unstable, then the level of similarity between 
the distributions will necessarily be unstable 
as well, and not amenable to interpretation. 
Reliability information about the entire dis- 
tributions is thus appropriate and necessary, 
Reliability of the overall distribution within 
the sample is not, however, necessarily related 


3 W. A. Bousfield, B. H. Cohen, C. A. Whitmarsh, 
& W. D. Kinkaid. The Connecticut free associational 
norms. (Tech. Rep. No. 35, Contract Nonr 631 (00)) 
Office of Naval Research, 1961. 
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to the proportion of identical associations, 
and cannot be estimated from it a priori. 

It would naturally be very useful to have a 
single association measure versatile enough to 
measure all the kinds of relation described 
above, and statistically sound enough to allow 
confident interpretation and comparison of 
experimental results, It would be desirable if 
the same measure could be used as a reliabil- 
ity measure. Such an association measure 
would have to be a legitimate form oí the 

/ product-moment correlation. It is only in the 
case of the product-moment correlation (and 
covariance) that the matrix of bivariate rela- 
tions contains all the information necessary to 
determine every succeeding level of multi- 
variate relations among the variables. This 
property is essential if multivariate statistics 
computed from the bivariate measures are to 
be accorded any confidence, The product- 
moment correlation is, in addition, sufficiently 
flexible to express all the popular forms of 
relation—pair bonding (the simple bivariate 
7), interrelatedness of all the members of a 
list or sublist (mean squared r or the determi- 
nant), interrelatedness of two lists or sublists 
(canonical correlation), and relation between 
one word and any number of others (squared 
multiple 7). Finally, the product-moment cor- 
relation is almost universally applied as a 
measure of reliability, 


COMMON ELEMENTS CORRELATION 


There is one popular bivariate association 
index that has been considered a form of the 
product-moment correlation. This is the mu- 
tual frequency index, The mutual frequency 
for two stimuli Sı and S, is defined as the 
number of responses given in common to S, 
and S, divided by the total number of re- 
Sponses given to them. The mutual frequency 
can range from 0 to 1; it is represented as: 


È fe 
MEg--L. 
n 


2 


Where MF — mutual frequency. 


3 à i and j are 
> fo is the common freque 


ncy of each 
a response to both i and j (the 
quency is the smaller of the 


eng two 
> m is the number of respo; 


nses 


NETS 
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given to both i and j; 7 is the number of sub- 
jects. 
The mutual frequency index was introduced 
by P. M. Jenkins and Cofer (1957) and was 
given a rationale and description by Bous- 
field et al. (1960). Since then, it has become 
the most common association measure in use. 
None of the authors who first used the mutual 
frequency index made any claims about its 
statistical properties. Deese (1965), however, 
has shown that it is formally identical to 
simple algebraic derivation from the common 
elements form of the product-moment correla- 
tion, and he has considered the mutual E. 
quency index as a form of the common E 
ments correlation. In situations where t 
common elements correlation is appropriate 
it is a legitimate variant on the standard ae 
of the product-moment correlation. If then 
fore, the common elements equation is aR 
propriate to the comparison of free a 
tion response distributions, the mutual ad 
quency index qualifies as the versatile kr 
statistically sound measure that is de 
The mutual frequency index has been je 
preferred measure in factor-analytic s 
of association. It has not ben used as & p 
sure of reliability, but it easily could be. tri- 
reliability coefficient for an associative dis 
bution would be the mutual frequency i 
tween two response distributions to the ph 
stimulus. It could be computed either on ie 
retest data or on data from two m & 
samples. Rosenzweig (1964; cg wm 
Miller, 1966) has used the mutual elo i 
index in this latter way, although he W3 
investigating reliability as such. | e first 
The common elements correlation pen 5 
suggested by Spearman (1904); the a nd 
tual development, application, rational™ 
formal derivation were provided by 7) 
(1919, 1935, 1951), Kelley (192 ommo? 
Peters and Van Voorhis (1940). The F i 
elements formula is appropriate in & * arid 
in which the response to a stimulus rom ? 
may be considered a random sampling ^. re- 
pool of equally available elements- 
Sponse is partitioned into those wit d 
unique to it and those that it shares ie can 
responses. Calling the common wee? pons 
calling the unique elements from ! 


me 


elem 
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and B, a and b, respectively, then A = a + 
c and B = b + c. The correlation is the pro- 
portion of common elements (c) to the total, 
When summed over individuals: 


bM 
Qa 4- b+ Ye) 


The common elements equation was never 
intended for general use; in deriving it from 
the general form of the product-moment cor- 
relation, two special and (in the present case) 
unrealistic requirements must be met. The 
first is that the element sets a, b, and c must 
all be uncorrelated. The second is that the 
element sets must each be considered to have 
binomial variances of .5. (Peters & Van 
Voorhis [1940, pp. 118-123] discuss the ne- 
Cessity of both these special requirements for 
the algebraic derivation to be valid.) 

It is very unlikely that the first require- 
ment ever can be met, as a consequence of 
the way the common elements formula is ap- 
plied to the comparison of association distri- 
butions. In Table 1 two very simplified and 
hypothetical response distributions are shown, 
along with their partitioned elements. The 
mutual frequency index between the two 
response distributions is 60/200, or .30. In al- 
most every case where there is any “overlap” 
between the distributions there is also some 
remaining overlap between c and either a or b 
after partitioning. Complete independence of 
a, b, and c, in the required sense of no over- 
lap between the partitioned elements, could 
occur only if the observed response frequen- 
cies A and B were exactly equal (as in the 
Tesponse frequencies of the response “mouse” 
in Table 1) in every case where any overlap 
occurred. Such an equality cannot be expected 
In any real data, 

The second requirement highlights the gen- 
eral inapplicability of the common elements 
model to the comparison of association distri- 
butions, It is difficult to see just how the 
Variance of the response distributions, in either 
Whole or partitioned form, could be expressed 
75 a binomial variance. The variance of the 
Tequency of occurrence of each response, ex- 
Pressed as proportion, is of course a binomial 


Vari SM É 
‘nance; similarly, the variance of the pro- 
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TABLE 1 


HYPOTHETICAL RESPONSE DISTRIBUTIONS AND 
PARTITIONED RESPONSE VECTORS FOR 
Two Starr 


Stimulus 
Response Cheese Cottage 
A a e b B 
Burger 50 40 | 10 0} 10 
Cheddar 30 20 | 10 0] 10 
Cheese 100°} 80 | 20 0| 20 
Cottage 10 0| 10 90 | 100* 
Cow 0 0 0 10 | 10 
Cream 0 0 0 10 | 10 
Dairy 0 0 0 10 | 10 
Mouse 10 0| 10 0| 10 
Rat 0 0 0 10 | 10 
Swiss 0 0 0 10 | 10 
Sum of frequencies |200 | 140 | 60 | 140 |200 


60) 
n i ing the reliability of the 
tions (cf. Rosenzweig, 1964). 


ssociative distribu- 


portional frequency of each partitioned ele- 
ment is binomial. However, for the binomial 
variance of each set of elements (i.e., of each 
a, b, and c element derived from one re- 
sponse) to be equal, the three elements would 
have to have equal frequencies, and it is im- 
possible for them to do so. 

The mutual frequency index, then, fails to 
satisfy the statistical model for the common 
elements correlation and thus cannot be con- 
sidered a legitimate product-moment correla- 
tion. The criticism concerning lack of statis. 
tical justification applies to it in full force. 
Indeed, this criticism is most serious in the 
case of the mutual frequency, since it has the 
appearance of a bivariate correlation and has 
been erroneously assumed to be a legitimate 
correlation in applications to, for example, 
factor analysis. To date, none of the reported 
studies that have included factoring of mu. 
tual frequency matrices (Deese, 1962, 1964, 
1965; Howe, 1966) have questioned the 
status of the mutual frequency as a correla- 
tion coefficient. 


v^ 
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CoNDITIONAL PROBABILITIES CORRELATION 
AND COVARIANCE 


In any task calling for single, discrete, 
separable responses to each of a number of 
stimuli, the response distribution can be con- 
sidered a set of conditional probabilities. The 
responses are described as conditional upon 
the occurrence of each particular stimulus. 
This situation exists with word association 
data, and with data such as those from forced- 
choice psychophysical tasks. 

With m unique responses to each stimulus, 
each set of conditional probabilities describes 
a vector in m space. The cosine of the angle 
between any two vectors is the correlation be- 
tween the two vectors and can be taken as a 
measure of the similarity of the two stimuli 
(Rosner, 1956). Using Rosner’s notation, the 
equation for the correlation coefficient from 
the conditional probabilities is: 


X pp) 


fij = D ——À 
( XE Py py 
where 5;(À) is the probability of response k, 


given stimulus i; p;(k) is the probability of 
response £, given stimulus j; Xp:(k)? is the 
sum of squares of proportions on the ith re- 
sponse vector (ie, the sum of Squares of the 
proportional frequencies of the responses to 
the ith stimulus). As the numerator is the sum 
of cross-products of proportional frequencies, 
the measure is always positive or zero. It can 
Tange from 0 to 1, Rosner (1956) discussed 
the ways in which this measure is equivalent 
to and different from the traditional Pearson’s 
Product-moment correlation, 

The conditional Probabilities correlation is 
a general measure, and is statistically appro- 
PHate for use with any distribution of dis- 


crete responses, It is the appropriate variant 
9n the standard pr 
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nosis. No previous word association studies 
have been found in which it is used. F 
Like the mutual frequency index, the con " 
tional probabilities correlation can also E. 
used as a reliability measure. hn A 
proportion of identical associations, bot 3 E. 
be used to measure split-sample, as wel e 
test-retest, reliability. In splitanta ud 
scale, reliability, the response distri E A 
collected from one-half the subject e 
compared with that collected from the Me. 
half. Scale reliability is somewhat ala 
to split-half reliability, in which the ere 
from item subsets, rather than from su a 
subsets, are compared. Noble (1955) has y 
scribed the rationale for the measurement " 
scale reliability and has shown how the Spe 
man-Brown correction applies. +A 
In some situations it may be valuable E, 
consider the conditional probabilities Mn 
ance. Correlations are computed by adjust! 3 
the covariance to correct for the effect ko 
equal variances in the two distributi s 
Usually, it is desirable to correct jor T is 
variances, as the metric of the variab e 
frequently arbitrary (e.g., inches or uh. 
is often incommensurable for the two bs E 
bles (e.g., inches and seconds). In the e 
conditional probabilities derived from Y : 
associations, the metric is the same for 
variables and is defined by the responi ipi 
tribution itself. The variances of the dist™” 
tions are therefore comparable and 
real information about the shape of y 
tributions, Very high variance indica 


each of a^small number of responst ance 
emitted by many subjects; very low be if- 
indicates that many subjects each A 
ferent responses to the stimulus wor! or tions 
For a bounded distribution of wr? pr 
the variance is the sum of squares of ply the 
portions; the covariance formula is Eos give? 
numerator of the correlation en od 
above, and equals the sum of the cr e e 
ucts of the conditional probabilities. p use 
ditional probabilities covariance et pion 
by Rothkopf (1960) to analyze a 
to pictures of tools. 


1 
| 
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EXPERIMENTAL PERFORMANCE OF ConpI- 
TIONAL PROBABILITIES MEASURES AND 
MUTUAL Frequency INDEX 


In an unpublished study by Mackenzie,* 
283 undergraduate subjects were required to 
Write single-response free associations to 126 
Stimulus words. One week later, the same sub- 
Jects Were given an oral presentation of 1 of 
10 lists of 26 words selected from the free- 
association list and were required to reproduce 
as much of their 26-word list as possible in 
free recall. Recall was measured in two ways, 
as (a) the average number of words recalled 
from each of the 10 word lists and (b) the 
proportion of subjects who recalled each word 
from each list. Association measures based 
9n the conditional probabilities measures and 
on the mutual frequency index were computed 
from the free-association responses and were 
Used to predict both measures of free recall. 
To provide data for comparison of the reli- 
ability measures, the subjects wrote the word 
association test again, 2 weeks after the first 
Administration. 

To predict the mean number of words re- 
Called from each word list, the mean of the 
bivariate indices describing relationships with- 
In each word list was used as a predictor. The 
mean value was chosen so as to facilitate 
Comparison with the results of previous stud- 
les (e.g, Deese, 1965). The correlations 
among the mean values of the association 
Measures over 10 lists, and the number of 
Words recalled from each list, are shown in 
the upper half of Table 2. All three predictive 
Measures are highly correlated and perform 
almost equivalently, The high correlations 
among the predictors do not, of course, refer 
9 similarities between the bivariate indices 
Er selves, but to similarities between the 

roids, There is no significant difference in 
‘sina power between measures. pes 
te consonant with those reporte 
y on the predictive power of the 


tual frequency (Deese, 1965). 

B. p, Mackenzie, A comparison of word associa- 
Measures as predictors of recall, and an assess- 

lish of the reliability of free associations. Unpub- 

4 master's thesis, Simon Fraser University, 
Y, British Columbia, 1969. 
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TABLE 2 


CORRELATIONS AMONG DERIVED ASSOCIATION 
MEASURES AND RECALL 


Complete lists 


Measure | r cov MI Recall 

r 1.00 , 

cov | .99 1.00 

MF 96 .93 1.00 

Recall 48 -76 „86 1.00 
Individual words 

Measure | > | cov | MF | Recall 

r | 1.00 

cov | 96 1.00 

ME | 97 93 1.00 

Recall |  .53 E EU 1.00 


Note.—MF = mutual frequency. 


For the prediction of recall of each word in 
a list, the squared multiple correlation of 
each word with the rest of the words in the 
list was used, with the squared multiple cor- 
relation computed individually from each of 
the associative measures. The correlations 
among these derived measures, and with re- 
call, are shown in the lower half of Table 2. 
Again, the performances of the measures are 
very similar. 

For comparison of reliability measures, the 
conditional probabilities test-retest reliabil- 
ity and (on the first test administration) 
scale reliability with the Spearman-Brown 
correction, and the proportion of identical 
associations, were calculated for each stimulus 
word. The reliability values for a sample of 
the words are shown in Table 3. The correla- 
tions among the three measures are shown in 
Table 4. 

Both the test-retest reliability and the 
scale reliability are very high, with means 
over the 126 words of .94 and .96, respec- 
tively, and standard deviations of .04 and .03. 
With the use of these measures, therefore, 
considerable confidence can be placed in the 
response distributions of a great majority of 
the stimulus words. It is also noteworthy, 
however, that a few of the words have rather 
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TABLE 4 5. 
CORRELATIONS AMONG RELIABILITY MEASUR 
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TABLE 3 
RELIABILITY ÍNDICES FOR TEN SrrMULUS WORDS 
Measure 
Word Proportion 
opori Scale Test-retesi 
oriens a r a 
Burger AT .99 .99 
Cheddar 94 99 -99 
Cheese .20 93 .88 
Cottage 37 97 89 
Cow 231 .98 .97 
Cream 20 97 18 
Dairy BT 95 97 
Mouse 31 -96 -94 
Rat 31 .98 97 
Swiss .58 99 99 
Mean? .33 96 94 
SD» 412 .03 04 


? With Spearman-Brown correction, 
h From entire sample of 126 words, 


lower reliabilities; 4 of the 126 have reliabili- 
ties lower than .80 as measured with one of 
these two measures. These lower reliabilities 
would serve as a check on the interpretation 
of between-word correlations involving these 
words, 

The proportion of identical associations, 
considered as a reliability measure compara- 
ble to other reliability indices, presents a 
very different picture of the reliability of the 
Tesponse distributions. The mean proportion 
of identical associations is only 33, far less 
than the mean of the other reliability mea- 
Sures, and has a substantially greater stan- 
dard deviation of 12. The obtained mean 
Proportion is, however, consonant with the 
Proportions reported by other investigators 
(Brotsky & Linton, 1967; Gegoski & Riegel, 
1967), 

The correla’ 
ability, 
identical 
ate in t 
-61 bet 
measur: 
there 


tions among the test-retest reli- 
scale reliability, and proportion of 
association measures are only moder- 
his sample. The correlation of only 
Ween the two conditional probabilities 


es is somewhat unexpected. Evidently, 
are somewhat 


Scale | 


Macsi reliability 
Proportion of iden- 
tical associations 1.00 T 
Scale reliability -61 Ii o 1.00 
Test-retest reliability .50 .6l 


ditional probabilities are more similar to E. 
other, by virtue of having similar means id 
variance, than either is to the propor e 
of identical associations. If a researcher M. 
to make a selection of words for exper ea 
purposes on the basis of their bert. 
reliabilities, he would have to decide a P. 
what form of reliability was most suite A 
his needs. In the majority of experimen i 
situations, and in all those in which M 
ability coefficient equivalent to a wt the 
moment correlation is appropriate, one x the 
conditional probabilities measures will b 
preferred measure. 


CONCLUSION M. 
nction 


For the prediction of two basic fu cities 
r 


of free recall the conditional proba ow 
measures have approximately the sama p 

as the mutual frequency index, Fe o 
hitherto been the most powerful - addi- 
free recall to appear in the literature. pilitie* 
tion, however, the conditional pee at are 
measures have statistical properties ssentid! 
always desirable and are frequently © 

and can be applied with some oF testi? 
a wide variety of predictive qu metho 
tasks. In particular, they are bot! ‘a 
logically and statistically appropria ciatio s 
measurement of the reliability of T gener? 
They are recommended, therefore, 2? ? 
measures of association. 
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UNIVARIATE VERSUS MULTIVARIATE TESTS IN 
REPEATED-MEASURES EXPERIMENTS 


MICHAEL L. DAVIDSON? 


University of Rochester 


The usual univariate test for the repeated-measures effect in a one-way design 


Tests upon an assumption of uniform v: 
test of this assumption is shown to have 


ariances and covariances. The standard 


acceptable power only when the multi- 


variate test of the hypothesis is essentially as powerful as the univariate test. A 


modified form of the univ. 


ariate test, not requiring the assumption of uniformity, 


is compared to the multivariate test with respect to power. Depending on Es 
variance-covariance structure of the data and the alternative hvpothesis, the 


univariate test 


ranges from somewhat better to much worse than the multivariate 


lest. There are possibly interesting experimental effects which the univariate test 


is virtually powerless to detect. 


Suppose that each of n subjects (or other 
randomly sampled entities) has vielded a score 
on some dependent variable under ach of k 
fixed experimental conditions; let Y;; be the 
Score of the ith subject in the jth condition. 
Within the usual nonadditive model, there are 
two general methods for testing the hypothesis 
that the several experimental conditions do not 
differentially affect. the dependent variable 
(Greenhouse & Geisser, 1959; Scheffé, 1959). 


Univariate Test 


Sums of squares for Treatments and Treat- 


ments X Subjects interaction are computed 
according to: 


k 
SSr= nd (X,— X. 
J=1 


and 


n k 

Soe-ilQGu-forf Rye 
i=l j=] 

where the dot denotes ay 


i eraging over the 
subscript that it repl est : 


aces. The test statistic 
SS1/(k — 1) 
SSrxs/ (n — 1) (k — 1) 
is referred to the F 
and (n — 1)(k —1 
U stands for univa 


m [1] 


distribution with (k — 1) 
) degrees of freedom, where 
riate. 


ard E. Ware in the development of 


Psychology, University of 
7. 


Multivariate Test j 
Any set of (k — 1) linearly independen! 
contrasts is chosen (the test statistic is d 3 
pendent of the choice of contrasts), 2 
score is computed for each subject on € 
contrast, for example, 7 
Yg-Xg-X, j=1,2,...,k-1 D 


e e 
The means over subjects Y.; and ic 
matrix S = [Sm] of variances and atte 
ances of the Ys are computed, the la 
according to 


Xi. 


Yim), 


Red [3) 
, Sm be 


Sim 9 LY (Ya— Pin — 
i=l 


bom = Ty 


The matrix S is then inverted ; seen e 
the typical element of 57, the test statis 


=f SUSY NET T 
Fa = necp NJ E n 
k—1 l=] m=1 


1) 
slain, RTL (^ 

is referred to the F distribution stb. h where 
and (x — k + 1) degrees of freedom; 

M stands for multivariate. 


STATISTICAL ASSUMPTIONS «an test? 
Both the univariate and adlana fro 
rest upon the assumption that the k a have: 
the various experimental gu. i 
the population, a multivariate a poth 
bution (Schefié, 1959). When k = ? correla st 
are identical to the usual / test cal riat€ E: 
observations. Otherwise, the Laer un 
Tests upon a further assumption, 
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lying the multivariate test. Let Ex = [om] 
be the matrix of population variances and co- 
Variances of the k variables; Xy is said to be 
uniform (Geisser, 1963) if (a) all the variances 
are the same and (5) all the covariances are the 
same, that is, 


en Ser lm... 
if 1m, 


for I,m=1,...,k [4] 


Cim = po? 


If Xy is not uniform, the univariate test 
wrongly rejects the null hypothesis with a 
probability larger than that corresponding to 
the critical value of F in the tables. Data 
illustrating this effect are reproduced in Figure 
1. Imhof's (1962) data are exact probabilities 
for k — 5; the data from Collier, Baker, 
Mandeville, and Hayes (1967) result from 
empirically obtained sampling distributions 
when k = 4 and when there are not one but 
three groups of subjects comprising a non- 
repeated-measures factor. The parameter e, 
after Greenhouse and Geisser (1959), indicates 
how strongly the uniformity assumption is 
violated; e = 1.0 if Sx is uniform, and e can 
never be smaller than 1/(k — 1). The actual 
probability of the Tvpe I error depends pri- 
marily on the value of e, in the fashion indi- 
cated by the solid lines in Figure 1, and only 
weakly on x and other factors. 

Moderately strong violations of unformity 
do not seem unlikely in practice. It appears 
that serious departures from uniformity are 
especially likely when the data result from 
repeated measurements over time and there is 
sequential dependency between succeeding 
occasions of measurement. Danford, Hughes, 
and McNee (1960) presented real data of this 
type, for which k = 10 and e = 412. 

Two procedures have been suggested for 
modifying the univariate test to gain the 
necessary control over a: (a) One may sta- 
Ustically test the uniformity assumption and 
use the univariate test only when it is not 
rejected; or (5) one may in effect set the value 
E a used in referring to the tables of F lower 
than lts intended value, to compensate for 
ril increases due to violations of the 

Hormity assumption. Let us consider these 
Procedures in turn 


o3or © Imhof n=5 
O Imhof n=9 
® Collier etol. n=15 


O Collier etal n=45 


oeor N 
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Fic. 1. Actual probabilities of the Type I 
error for the univariate test. 


TESTING THE ASSUMPTION OF UNIFORMITY 


Box (1950) proposed a test for uniformity 
which is sometimes recommended (Danford 
et al., 1960; Winer, 1962). The power of Box’s 
test has not been studied, but a rough estimate 
can be obtained as follows. Given a population 
variance-covariance matrix Ex, suppose that a 
sample is selected with variance-covariance 
matrix the same as Sx. (One might intuitively 
expect that random samples would, due to 
sampling variability, tend to have less uniform 
variance-covariance matrices than their parent 
population, but Collier et al. [1967] presented 
data which suggest that such a bias is appre- 
ciable only when e is close to 1.) It is then 
possible to find how large the sample must be 
for Bos's test to reject the uniformity assump- 
tion; this sample size will be approximately 
that for which the test has power of .5 to 
detect the alternative Sx chosen. Table 1 
shows the results of such an analysis for a 
variety of hypothetical variance-covariance 
matrices. These calculations suggest that 
severe deviations from uniformity can be 
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TABLE 1 


APPROXIMATE SAMPLE SIZES FOR WHICH Box's TEST 
(a = .05) Has Power or .5 ro DETECT 
DEPARTURES FROM UNIFORMITY 


k Case No. € Minimum x 
3 1 559 8 
3 2 566 8 
3 3 862 48 
3 4 871 19 
6 5 485 11 
6 6 -616 16 
6 7 644 29 
6 8 -713 14 
6 9 798 42 
6 10 852 62 


detected by Box’s test if 7 exceeds k by a few, 
but that larger js are necessary to detect more 
moderate deviations. If one decides, in view of 
Figure 1, that the univariate test should be 
avoided if e is less than .7, then # must exceed 
k by perhaps 20 for Box’s test to be sufficiently 
powerful. But if 7 is this large, one might as 
well use the multivariate test, even if the 
uniformity assumption is met. 


Relative Power of Univariate and Multivariate 
Tests, Given Uniformity 


If 2x is in fact uniform, but the null hy- 
pothesis is false (there are systematic differ- 
ences among the k experimental conditions), 
then the test statistics Fy and Far both have 
noncentral F distributions. The statistics Fu 
and Fm have (k — 1) degrees of freedom for 
the numerator, (k — 1)(n —1) and (n— k 
+ 1) degrees of freedom, respectively, for the 
denominator, and a common noncentrality 
parameter ô given by: : 


k 
ra n aa 2 5 
es NA Aj — u.) 5 

e — 5) i d u.) [5] 
where o? and p are the population variance and 
correlation of the k measures (Equation 4), 
4; 18 the population mean for f 
mental condi: 


Th "Nur e an 
il o this distribution follows from 


oo in Equation 5 for à? is easily 
Y selecting the contrasts given in 


obtai 
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Equation 2, computing the population mie 
variances, and covariances of the Ys with t e 
aid of Equation 4, and substituting these d 
the general expression for the am 
parameter for Fu (Scheffé, 1959, Section i 

Table 2 illustrates the relative power of the 
univariate and multivariate tests when =x P 
uniform; the alternative hypotheses are NI 
pressed in terms of à. It seems fair to say tha 
the multivariate test is nearly as powerful as 
the univariate test when 7 exceeds k by 20 or 
more. 


Conclusion 


Box’s test is of little value for deciding 
whether to use the univariate test OT a 
multivariate test. When 7 is small, one s 
depend on Box's test to detect serious a 
partures from uniformity. When 7 is aca 
would be statistically acceptable to select on f 
multivariate or univariate test according A 
whether Box’s test rejected or failed to reje 


TABLE 2 
RELATIVE Power or UNIVARIATE AND Mu . 
TESTS WHEN UNIFORMITY ASSUMPTION 
1s MET (a = .05) 


yrivARIATE 


Power of test (1 — B) 
k 5/NR | Testa me 
er Pr ee ae 
| 1 
3 |10| U 21 2; | 30 r^ 
M A2 22 | 28 | “a 
3 |20| U .66 s | 86 | 
M 30 70 | -83 | gor 
3 800 | y 95 99 | .996 “906 
M 52 96 | 99 | 
z | 46 
o | 10) U 39 42 | 45 | “0 
M Al 23 a 85 
6 15| U 47 81 p^ 78 
M 17 49 | 7 99 
6 | 20) U 97 38 | 92 gf 
M 25 76 | 95 ^ 
hi 
16° 5| U AT 17 ay m 
M .06 p [2 | 
16 10] U .68 6 | 15 156 
M 10 me | | 
16 | 15] U 98 | 98 1 |9 
M 16 53 
aU = univariate; M = multivariate. entral Pom 


f nonc 
» Entries are from Tang's (1938) ian distributi 
* Entries are areas under noncen le. at the Ups 
puted numerically on the IBM, 360: 92 á NC 
Rochester Computing Center, with a 
mann & Ghosh, 1964). 


UNIVARIATE VERSUS MULTIVARIATE TESTS 449 
TABLE 3 
RELATIVE POWER OF MODIFIED UNIVARIATE AND MULTIVARIATE 
‘Tests FOR VARIOUS ALTERNATIVES (a = .05)* 
Power of test (1 — 8) 
k öx/ NE Test® Casee 
n=k+1 k+3 k +6 k 4-20 n=e 
3 1.0 UV A 068 .108 439 482 21 
UV B 231 .291 .332 381 410 
MV all 119 A76 223 .283 .323 
3 1.5 UV A ATE 276 355 450 507 
UV B 432 .553 .626 .700 738 
MV all 198 340 AAS 527 .639 
3 2.0 UV A 338 524 .638 .750 .804 
UV B 643 789 857 911 934 
MV all 296 41 .696 .829 .883 
6 1.0 UV A .025 .035 047 .069 .094 
UV B 537 576 .608 .654 .688 
MV all 106 .165 .228 341 .432 
6 1.5 UV A 147 199 .251 .340 .420 
UV B 862 S894 .916 .942 .957 
MV all 472 .324 A81 .102 .821 
6 2.0 UV A 458 .561 .646 57 .829 
UV B 981 .989 .993 .997 .998 
MV all 255 .527 .749 .934 -980 


a Entries are areas under noncentral 
Qmputing Center, wit i NC 

UV = modified uni = 
© See test for explanation of code. 


the uniformity assumption. But this plan is 
Pointless, since the multivariate test is as good 
as there is in either case. Finally, Box’s test is 
Computationally as difficult as the multivariate 
test, and is not useful in saving computational 
labor. One might well use the univariate test 
if it is known a priori that Ex is uniform, but 
this knowledge is uncommon in practice. 


Moprerep UNIVARIATE TEST 


oi Usual method of adjusting œ to com 
m Íor its increase when the uniformity 
of I a is violated is to alter the degrees 
Greenhouse for the univariate test statistic. 
(1962) inde and Geisser (1959) and Imhof 
at whey pendently showed (after Box, 1954) 
experimen there are no differences among the 
almost e. ntal conditions, Fu is distributed 
x Q, actly as F with e(k — 1) and e(t — 1) 
index or degrees of freedom, where € 3s the 
nonuniformity used in Table 1, etc. 


s, computed numerically on the IBM 360-65 at the University of Rochester 
nn & Ghosh, 1964), Parameters for the univariate test are after Imhof (1962), 


(The solid curves in Figure 1 are computed 
from this distribution with k = 5, n = 10.) 
Since e cannot be less than 1/(k — 1), Green- 
house and Geisser (1959) suggested the con- 
servative procedure of referring Fy as given 
by Equation 1 to the F tables with 1 and 
(n — 1) degrees of freedom. It is also possible 
to estimate e from the sample, a less con- 
servative and apparently successful procedure 
(Collier et al., 1967). 

Since neither the modified univariate test 
nor the multivariate test requires the uni- 
formity assumption, à choice between them 
depends upon which is the more powerful to 
detect effects of interest. Unfortunately, the 
relative power of the two tests is complexly 
determined; in addition to z, the relative 
power depends upon (a) the degree to which Ex 
is uniform and (b) when 2x is not uniform, the 
nature of the alternative hypothesis. It is 


useful to consider four limiting cases. 
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TABLE 4 


ILLUSTRATIVE Dara ron Cases B-D* 


Case B Case C Case D 
Subject 
X1 | X| Xa | Xi | X Xz} Xi | X X: 
1 49 | 53 | 100 | 52 | 50 | 71 | 51 | 51 | 92 
2 | 53) 49 | 120] 56 | 46 | 91 | 55 | 47 112 
3 63 | 65| 74| 66 | 62 | 45 | 65 | 63 | 66 
4 | 37] 33} 44| 40| 30 | 15 | 39 31 | 36 
5 |39|39| 68] 42 | 36 | 39 | 41 37 | 60 
6 | 43} 51] 96|46| 48 | 67 | 45 | 49 88 
7 |43|47| 34|46 | 44| 5|45 45 | 26 
8 |49|45| 56|52|42| 27 | 51 43 | 48 
9 | 65 | 65 | 114} 68 | 62 | 85 67 | 63 | 106 
10 | 59/53} 84] 62] 50 | 55 61 | 51 | 76 
Mean | 50 | 50 | 79| 53 | 47 50 | 52 | 48 | 71 


^ See text for explanation of code. 


Case A: Uniform Dx 


When the uniformity assumption is in fact 
met, the two test statistics have the distri- 
butions given in the preceding section. The 
modified univariate test with e assumed to be 
1/(k — 1) is most conservative in this case, 
hence less powerful than the unmodified test. 
Table 3 indicates that the univariate test is 
then generally inferior to the multivariate test ; 
the alternative hypothesis is expressed in terms 
of ôs, the noncentrality parameter for Fyr. 
The form of the univariate test in which e is 
estimated will compare more favorably, but is 
still somewhat conservative (Collier et al., 
1967) and thus inferior to the multivariate 
test for large n at least. 


Case B: Best for the Univariate Test 
When Xy is strongly nonuniform, the relative 


Power of the two tests depends on whether the 
Treatments (experimental conditions) effect 


xperimental 


Correlated ie = 
Whereas the third 
and is less well c 


90) and have small variance, 


arger variance 
orrelated with the first two 
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(ris = .54; res = .60). The value of € com- 
puted from this nonuniform set of variances 
and covariances is .52. " 

The Treatments X Subjects variance arises 
mainly from the difference between Conditions 
1 and 2 on the one hand and Condition 3 on 
the other; in Case B, it is also the third con- 
dition that has the deviant mean (Table 4) 
and hence is responsible for the Treatments 
variance. The test statistics for these data are 
Fu = 13.28 (df = 1/9 for the modified sen 
and Fm = 6.06 (df = 2/8). Table 3 shows x 
relative power of the two tests in the theo 
retical limit of Case B, in which Fu bas 
noncentral F distribution, df = 1/n — 1, bc 
noncentrality parameter oy. This limit aris i 
when (a) the experimental conditions (or ee: 
k linear combinations of them) divide Es 
two sets; (b) the data are perfectly correlat 
within each set and are less well correlate 
between the sets (so that € is minimal); d 
(c) the treatment means are the same li. 
each set but differ between the sets. Fm 
this sort of alternative hypothesis, the R^ 
variate test is always somewhat more power 
than the multivariate test. 


Case C: Best for the Multivariate Test 


Table 4, Case C, shows data for which p. 
variances and covariances are the same gs m 
Case B, but in which the first two experimen 
conditions differ by a small amount ee 
over subjects, and the third condition has ies 
same mean as the average of the first et 
these data, Fy = 43 and Fu = an put 
the multivariate test detects the sma thi 
reliable difference. The theoretical limit distr 
case is one in which Fy has a central al en the 
bution, df = 1/n — 1. This happens W^ 


r- 
: a apa vo pe 
experimental conditions divide into p 55 
fectly correlated sets (as in Case B); between 
the treatment means are the same 
TABLE 5 
VARIANCES AND COVARIANCES en TH 
Data iN Taste 4 (All Cases) 
3 
Condition 1 2 | 
140.2 

1 87.4 80.2 1470 

2 91.4 158.6 

3 
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sets but differ very slightly within one of the 
sets. In this limiting case, the univariate test 
is completely powerless (power = .05 when 
a = .05) to detect the small but reliable 
experimental effect. More generally, the uni- 
variate test is relatively powerless to detect 
any reliable difference between highly corre- 
lated experimental conditions when other less 
well correlated conditions are present (Imhof, 
1962, Lemma 4). The multivariate test suc- 
ceeds, since it is designed to search among the 
data for that within-subject contrast which is 
maximally reliable over subjects in the sense 
of having the maximum value of the / statistic 
for correlated observations (Morrison, 1967). 


Case D: Intermediate Nonuniform Case 

Cases B and C represent the extremes for 
the relative power of the two tests, which 
occur when Xy is highly nonuniform. A rea- 
sonable intermediate case is one in which the 
Treatment differences are distributed among 
the conditions independently of the source of 
the Treatment X Subjects interaction. For 
example, the data in Table 4, Case D, show a 
small but reliable mean difference between 
Conditions 1 and 2 and a larger but approxi- 
mately equally reliable mean difference be- 
tween Condition 3 and the other two. The 
multivariate test is roughly equally sensitive 
to these two effects, and Fy, = 6.98. The uni- 
variate test, sensitive only to the latter, has 
Fy = 7.15. In the theoretical limit (minimal e) 
for this intermediate case the multivariate test 
is somewhat more powerful than the univariate 
if n exceeds k by a few. 


Discussion 


When it is not known in advance that Xx is 
Uniform, only the modified univariate and 
multivariate tests give an investigator the 
Necessary control over the probability of a 
MEM I error. Provided that n exceeds k by a 
SR modified univariate test ranges, with 
much woo Power, from somewhat better to 
modified rse than the multivariate test. The 
detectin Univariate test is relatively poor in 

Ypothes the Case C „type of alternative 
o repeated It is often said that the advantage 
Vea “measures designs is their ability to 

, Small but reliable effects. An extension 


of 
i 
S Judgment would seem to favor the 
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multivariate test; whereas both tests eliminate 
subject variance from the data, the multi- 
variate test eliminates in addition that portion 
of the Treatments X Subjects interaction 
variance which comes from those contrasts 
that show no main effect. 


Very Small Samples 


'The power of the multivariate test drops 
rapidly as # becomes nearly as small as k; 
when z is smaller than k, Fm cannot be com- 
puted. The modified univariate test remains 
valid. If Case C alternatives are of special 
interest, it may be better to reduce the number 
of experimental conditions represented in the 
data and to use the multivariate test. One may 
group adjacent experimental trials, for example 
(see also Marks, 1968). 


Other Repeated-Measures Designs 


Most designs involving repeated measures 
can be analyzed with either univariate (Myers, 
1966; Winer, 1962) or multivariate (Bock, 
1963; Cole & Grizzle, 1966; Danford et al., 
1960; Greenhouse & Geisser, 1959; Morrison, 
1967) techniques. Both involve assumptions 
of multivariate normality and homogeneity of 
variances and covariances across experimental 
groups, if they are more than one (this assump- 
tion is of course not the assumption of uni- 
formity within groups across conditions). If 
repeated-measures tests (main effects, inter- 
actions, planned comparisons, etc.) involve 
only a single comparison (one numerator 
degree of freedom), the two methods are 
equivalent. Otherwise, the univariate tech- 
niques rest on special assumptions analogous 
to the uniformity assumptions; Greenhouse 
and Geisser (1959) presented the modified 
univariate test for a general case involving one 
repeated-measures factor. 

In general, the power of the two test sta- 
tistics for a particular repeated-measures test 
has not been adequately studied, and uncritical 
extension of the conclusions for the one-way 
design is unwarranted. From the descriptive 
statistics themselves, however, it can be seen 
that the multivariate test will always be 
relatively sensitive to Case C alternatives, in 
which a relatively small but consistent effect 
on an experimental dimension E is present in 
the context of a relatively large E X Subjects 


> 
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interaction arising from other experimental 
conditions. The dimension E may be a main 
repeated-measures effect, an interaction be- 
tween two repeated factors, etc. 

In factorial designs, the tests for main 
repeated-measures effects are, for both meth- 
ods, essentially single-factor tests involving 
means that are taken over the other repeated- 
measures factors. Therefore the present results 
for the one-way design generalize to tests for 
main effects. 


CONCLUSIONS 


Because the commonly used univariate test 
is not robust with regard to the uniformity 
assumption, either the modified univariate or 
the multivariate test should be used. Among 
theoretically possible cases, the multivariate 
test is usually somewhat more powerful pro- 
vided that z exceeds k by a few. Large differ- 
ences in power occur only when small but 
reliable effects are present with effects highly 
variable but averaging to zero over subjects; 
the multivariate test is clearly preferable in 
such cases. But the modified univariate test 
is easier to compute, and the only alternative 
if n is very small. 


REFERENCES 


Bareaann, R. E., & Guosn, S. P. Noncentral statistical 
distribution programs for a computer language. York- 
(m Heights, N.Y.: IBM Watson Research Center, 


Bock, R. D. Multivariate analysis of variance of re- 
peated measurements. In C. W. Harris (Ed.), 
Problems in measuring change. Madison, Wis.: 
University of Wisconsin Press, 1963 


MICHAEL L. DAVIDSON 


Box, G. E. P. Problems in the analysis of growth an 
wear curves. Biometrics, 1950, 6, 362-389. E 
Box, G. E. P. Some theorems on quadratic iorm 
applied in the study of analysis of variance pr »blems 
I. Effect of inequality of variance in the n 
classification. Annals of Mathematical Statistics, 195 
25, 290-302. m 
Core, J. W. L., & GaizzrE, J. E. Applications d 
multivariate analysis of variance to repeated na 
surements experiments. Biometrics, 1966, 22, e 
Cottier, R. O., JR., BAKER, F. B., MANDEVILLE, O. 
& Hayes, T, F. Estimates of test size for several tesi 
procedures on conventional variance galios in E 
repeated measures design. Psychometrika, 1967, 
339-353. r P n 
Daxrorp, M. B., Hucnes, H. M., & McNEE, R. s 
On the analysis of repeated-measurements experi 
ments. Biometrics, 1960, 16, 547-565. € 
GrissER, S. Multivariate analysis of variance lor | 
special covariance case. Journal of the America 
Stalistical Association, 1963, 58, 660-669. ww 
GREENHOUSE, S. W., & GEissER, S. On methods in a 
analysis of profile data. Psychometrika, 1959, 
eg) i X ain 
Imnor, J. P. Testing the hypothesis of no fixed em 
effects in Scheffé’s mixed model. Annals of Math 
matical Statistics, 1962, 33, 1085-1095. AE 
Marks, E. Some profile methods useful with a 
covariance matrices. Psychological Bulletin, 1968, 
179-184. : 
Morrison, D. F. Multivariate statistical methods. Nd 
York: McGraw-Hill, 1967. A T 
Myers, J. L. Fundamentals of experimental desi£ 
Boston: Allyn and Bacon, 1966. 3 xoi 
Scuerré, H. The analysis of variance. New 
Wiley, 1959. ; E uS 
Tawc, P. C. The power function of the one a 
variance tests with tables and illustrations of v. 
use. Statistical Research Memoirs, 1938, 2, 126-149" 
Winer, B. J. Statistical principles in experimen 
design. New York: McGraw-Hill, 1962. 


(Received September 25, 1970) 


Manuscripts Accepted for Publication in the 


Psychological Bulletin 


n ity € 
cor Configurality in Clinical Judgment. Norman H, Anderson (Department of Psychology, Univers!) 
California, San Diego, P. O. Box 109, La Jolla, California 92037). 


5 2 s (ior: A: 
een Electrical Stimulation of the Brain and Conventionally Reinforced Behavior: An À 


Looking for Conf 


The Difference betw 
ative Analysis, T 


] E jr MUR lifax, 
rmingard I. Lenzer (Depart t of Psychology, Saint Mary's University, Ha 
Scotia, Canada). T sa ape Ks 


Self-Disclosure: A Literature Review. 


Minneapolis, Minnesota 55455). 


of Psychology, Cornell Universit: 


Activity as an Index of 
Spring Grove State Ho 


€gression Analysis of Proj 


ssoC 
Nov 


University € 


Dn 
t of Psychology» J ob 


py ^ = 
ne AN ov gy magni, a IE 


